Commit Graph

1487 Commits

Author SHA1 Message Date
xiaoluomi bbe9731f46 2.4_fix_mtp_forward_meta (#5977) 2026-01-10 00:41:36 +08:00
GoldPancake bdaabf05a0 [Cherry-Pick][Speculative Decoding] Return accepted tokens per head in response (#5947) (#5952)
* adjust log level
* add accepted tokens per head
2026-01-09 14:26:49 +08:00
xiaoluomi f12b7a7a19 support_lastnorm_gather_split_r2.4 (#5925)
* support_lastnorm_gather_split_r2.4

* support_lastnorm_gather_split_r2.4v1

* support_lastnorm_gather_split_r2.4v2
2026-01-08 19:29:59 -08:00
kevin 741a01562b [BugFix][Cherry-Pick] cp fix dyc8 cache bug(#5958) (#5959)
* cp fix dyc8 cache bug

* udpate code
2026-01-08 19:25:56 -08:00
Copilot 37bed64282 [Cherry-Pick][BugFix] Fix misleading logging in worker_process for request counting (#5939) (#5953)
* Initial plan

* [Cherry-Pick] Fix misleading logging in worker_process for request counting (PR #5939)

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Fix code style: remove unused req_ids variable

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-09 10:12:55 +08:00
GoldPancake 8049a4982e [Cherry-Pick][Bugfix] Fix entropy calculation bugs (#5941) (#5942)
* fix entropy bug
2026-01-08 20:57:45 +08:00
Yonghua Li 16645c671c [BugFix] fix xpu import set_data_ipc (#5945) 2026-01-08 14:35:19 +08:00
GoldPancake d05f5f0877 [Cherry-Pick][Bugfix] Fix mtp logprob hang problem when include stop_seq (#5927) (#5928)
* fix mtp logprob hang when include stop_seq
2026-01-08 14:21:33 +08:00
chenjian 1e8de9639e [Optim][Cherry-pick] Reduce preemption occurrence when blocks not enough(#5696) (#5808)
* [Optim] Reduce preemption occurrence when blocks not enough

* optimize performance using adaptive block reservation

* optimize performance

* fix

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-08 10:09:51 +08:00
lizhenyun01 0b630fc3c1 [Cherry-Pick] [BugFix] fix mtp split kv attetion (#5921)
* [BugFix] fix mtp split kv attetion

* clean code

* clean code
2026-01-07 19:50:02 +08:00
freeliuzc fb59f5613e support multi-step draft-model with cudagraph (#5898) 2026-01-07 17:12:47 +08:00
kevin 939dfa4877 [BugFix][Cherry-Pick] Cp fix eb5 prefix cache(#5879) (#5881)
* fix eb5 prefix bug

* update code

* update code

* update code

* update code
2026-01-06 23:49:32 -08:00
qwes5s5 ed3db9dceb logging switch (#5765)
Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>
2026-01-07 15:32:20 +08:00
yinwei 3002334b6d [Cherry-Pick] [XPU]Cherry-pick Support ZMQ logprobs(#5628) (#5852)
* update

* delete min_tokens

---------

Co-authored-by: qw86972190 <127910106+qw86972190@users.noreply.github.com>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1498355.gajl.baidu.com>
2026-01-07 10:33:40 +08:00
gaoziyuan 44e44abf1e [Bugfix]fix model weight signal tensor num (#5899)
* [Bugfix]fix model weight signal tensor num

* fix
2026-01-06 15:14:26 +08:00
Yonghua Li 682e1ab2d0 [Cherry-Pick] [BugFix] fix mtp cache attaching for pd disaggregation (#5884) (#5885)
* [fix] fix mtp cache attaching for pd disaggregation

* [fix] fix port
2026-01-06 14:19:38 +08:00
Yonghua Li f3ebd64446 [Cherry-Pick] [KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled (#5871) (#5859)
* [fix] temporarily forbid cpu cache in update/clear api

* [fix] stop launching cache transfer manager unless hierarchical cache is enabled
2026-01-06 11:05:45 +08:00
freeliuzc dcb0cceded [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738) (#5793)
* fix attn_mask_offset in mtp with multi-step and pd-split-mode

* fix xpu operater register

* update pmtp multi-step mtp strategy in d-split -mode

* add note

* fix xpu register

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2026-01-05 07:59:17 -08:00
GoldPancake c9a806de02 fix speculate metrics bug (#5875) 2026-01-05 19:43:55 +08:00
tianhaodongbd d624c5288b [RL] Change 'model' to the instance variable 'tmp_model' (#5873) 2026-01-05 02:09:14 -08:00
Copilot 9de6ae375c [Cherry-Pick][APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT(#5865) (#5867)
* Initial plan

* Cherry-pick PR #5865: Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-05 09:45:19 +08:00
ddchenhao66 3e04e43812 [Cherry-Pick][XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL (#5845) 2026-01-04 11:35:21 +08:00
chen 9a7eb33fd4 [Cherry-Pick][Optimization] Optimization for gather_logprob by 10GB (#5817)(#5846) (#5834)
* [Optimization] Optimization for gather_logprob by 10GB (#5817)

* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k

* only cuda run triton op (#5846)
2025-12-31 19:54:14 +08:00
kevin 20024b889c [Cherry-Pick][BugFix] cp skip_mm_revert(#5848) (#5849)
* cp skip_mm_revert

* update test
2025-12-31 17:29:49 +08:00
Yonghua Li 638009387d [Cherry-Pick] [BugFix] fix cache manager not launched in case of mtp or blockwise fp8 (#5840) (#5841)
* [BugFix] fix cache manager not launched in case of mtp or blockwise fp8

* [fix] fix mtp cache in mtp.py

* [fix] fix gpu ops import

* [fix] fix mtp layer idx
2025-12-31 15:08:34 +08:00
GoldPancake f33e642327 [Cherry-Pick][Speculative Decoding] Optimize draft logprob (#5842) (#5843)
* optimize draft logprob

* fix ut
2025-12-31 10:43:44 +08:00
kevin a247260deb eb5 mm skip prefix cache (#5839) 2025-12-30 05:31:21 -08:00
GoldPancake 0d29f6df03 [Cherry-Pick][BugFix] Fix entropy bugs (#5818) (#5819)
* [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)

* fix attn_mask_offset in mtp with multi-step and pd-split-mode

* fix xpu operater register

* update pmtp multi-step mtp strategy in d-split -mode

* add note

* fix xpu register

* fix entropy bugs

* Revert "[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)"

This reverts commit ba0d35a52e8775300a1459bfcaa39056df570525.

* fix ut

* fix

---------

Co-authored-by: freeliuzc <lzc842650834@gmail.com>
2025-12-29 20:45:03 -08:00
tianhaodongbd 834502711a [RL] add lm_head_fp32 in RolloutModelConfig (#5824) 2025-12-29 20:23:11 -08:00
Yuanle Liu b2bd2595af [Cherry-Pick][BugFix] Fix _disable_sequence_parallel_moe_if_needed#5740 (#5811) 2025-12-28 23:56:24 -08:00
kxz2002 df775c2811 [BugFix] Fix process_response_dict to support async in serving_completion (#5758) (#5802)
* support process_response_dict async initial commit

* fixbug

* add unit test

* optimize
2025-12-29 09:56:42 +08:00
chenjian c78c3be0d3 [BugFix] Fix preemption out of real_bsz (#5806) 2025-12-29 09:52:18 +08:00
kevin c170fc4dc5 [FDConfig][Cherry-Pick] Cp disable mm chunked(#5774) (#5775)
* disable chunked_mm_input in ernie5

* cp_disable_mm_chunked

* update test case

* update code
2025-12-26 15:31:46 +08:00
kevin 9a8e2152b1 [BugFix][Cherry-Pick] cp fix logprob bug(#5604) (#5770) 2025-12-25 04:09:16 -08:00
周周周 d0c5bcec3d [cherry-pick] support FA3 in mixed mode and support Qwen3 rope (#5655)
* [Others] Remove useless code (#5404)

* FA3 support qwen3 (#5441)

* commit
2025-12-25 11:11:16 +08:00
bukejiyu 7c62626e15 [Cherry-Pick][Loader]Fix bug in MTP weight loading #5744 (#5745)
* fix torch mtp

* fix

* update
2025-12-25 10:01:56 +08:00
bukejiyu fc3bccc5b6 [Cherry-Pick][Others]upgrade paddleformer to 0.4.0 #5599 (#5716)
* update 0.4.0

* update
2025-12-24 06:28:50 -08:00
chenjian 6945f87634 [Bug fix] Set enable_cache_output as false by default (#5752) 2025-12-24 21:28:08 +08:00
GoldPancake e51af01a65 [Cherry-Pick][Feature] Entropy calculation support #5692 (#5731)
* support entropy

* add script

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-24 15:42:43 +08:00
Yonghua Li 9ff99d2b03 [BugFix] fix double shutdown of comm group when rank0 clears weights slower than other ranks (#5710)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-12-23 01:51:35 -08:00
freeliuzc ceafd757f0 [Speculative Decoding]Support multi-step mtp with cudagraph (#5624) (#5670)
* support multi-step mtp with cudagraph

* fix usage

* fix unit test
2025-12-23 13:18:47 +08:00
ddchenhao66 eb309e5a2a [XPU]Set top_p=0.0 by default on XPU to optimize performance (#5688)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-23 11:00:53 +08:00
Yuanle Liu 90065084cb [BugFix] fix rl signal (#5678)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-12-22 00:31:24 -08:00
Yonghua Li ea16c82b43 [Cherry-Pick] [RL] provide options for whether shutdown comm group after weights cleared (#5663) (#5664)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [rl] provide options for whether shutdown comm group after weights cleared

* [fix] fix args hardcode

* [fix] change args type

* [fix] add worker process args
2025-12-19 23:18:03 +08:00
bukejiyu dd0014b7b9 del core (#5659)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-12-19 16:33:44 +08:00
kevin e10c5d5d61 cp fix eb5 prefix cache bug (#5644) 2025-12-19 14:57:17 +08:00
qw86972190 a9bb24bb56 [XPU]logprob bug (#5636) 2025-12-19 14:30:14 +08:00
Yuanle Liu b3f78815d8 update rl signal (#5650) 2025-12-18 20:04:18 -08:00
kevin 23bfd28624 [Cherry-Pick][BugFix] cp fix_cpu_cache_bugs(#5544) (#5577)
* cp fix_cpu_cache_bugs

* update ce case

* update test case

* update code
2025-12-19 11:48:50 +08:00
bukejiyu 2aa88d3621 [Cherry-Pick][RL]Fix RL load_weights #5642 (#5643) 2025-12-18 19:17:09 -08:00