xiaoluomi
bbe9731f46
2.4_fix_mtp_forward_meta ( #5977 )
2026-01-10 00:41:36 +08:00
GoldPancake
bdaabf05a0
[Cherry-Pick][Speculative Decoding] Return accepted tokens per head in response ( #5947 ) ( #5952 )
...
* adjust log level
* add accepted tokens per head
2026-01-09 14:26:49 +08:00
xiaoluomi
f12b7a7a19
support_lastnorm_gather_split_r2.4 ( #5925 )
...
* support_lastnorm_gather_split_r2.4
* support_lastnorm_gather_split_r2.4v1
* support_lastnorm_gather_split_r2.4v2
2026-01-08 19:29:59 -08:00
kevin
741a01562b
[BugFix][Cherry-Pick] cp fix dyc8 cache bug( #5958 ) ( #5959 )
...
* cp fix dyc8 cache bug
* udpate code
2026-01-08 19:25:56 -08:00
Copilot
37bed64282
[Cherry-Pick][BugFix] Fix misleading logging in worker_process for request counting ( #5939 ) ( #5953 )
...
* Initial plan
* [Cherry-Pick] Fix misleading logging in worker_process for request counting (PR #5939 )
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Fix code style: remove unused req_ids variable
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-09 10:12:55 +08:00
GoldPancake
8049a4982e
[Cherry-Pick][Bugfix] Fix entropy calculation bugs ( #5941 ) ( #5942 )
...
* fix entropy bug
2026-01-08 20:57:45 +08:00
Yonghua Li
16645c671c
[BugFix] fix xpu import set_data_ipc ( #5945 )
2026-01-08 14:35:19 +08:00
GoldPancake
d05f5f0877
[Cherry-Pick][Bugfix] Fix mtp logprob hang problem when include stop_seq ( #5927 ) ( #5928 )
...
* fix mtp logprob hang when include stop_seq
2026-01-08 14:21:33 +08:00
chenjian
1e8de9639e
[Optim][Cherry-pick] Reduce preemption occurrence when blocks not enough( #5696 ) ( #5808 )
...
* [Optim] Reduce preemption occurrence when blocks not enough
* optimize performance using adaptive block reservation
* optimize performance
* fix
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-08 10:09:51 +08:00
lizhenyun01
0b630fc3c1
[Cherry-Pick] [BugFix] fix mtp split kv attetion ( #5921 )
...
* [BugFix] fix mtp split kv attetion
* clean code
* clean code
2026-01-07 19:50:02 +08:00
freeliuzc
fb59f5613e
support multi-step draft-model with cudagraph ( #5898 )
2026-01-07 17:12:47 +08:00
kevin
939dfa4877
[BugFix][Cherry-Pick] Cp fix eb5 prefix cache( #5879 ) ( #5881 )
...
* fix eb5 prefix bug
* update code
* update code
* update code
* update code
2026-01-06 23:49:32 -08:00
qwes5s5
ed3db9dceb
logging switch ( #5765 )
...
Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com >
2026-01-07 15:32:20 +08:00
yinwei
3002334b6d
[Cherry-Pick] [XPU]Cherry-pick Support ZMQ logprobs( #5628 ) ( #5852 )
...
* update
* delete min_tokens
---------
Co-authored-by: qw86972190 <127910106+qw86972190@users.noreply.github.com >
Co-authored-by: root <root@gajl-bbc-onlinec-com-1498355.gajl.baidu.com >
2026-01-07 10:33:40 +08:00
gaoziyuan
44e44abf1e
[Bugfix]fix model weight signal tensor num ( #5899 )
...
* [Bugfix]fix model weight signal tensor num
* fix
2026-01-06 15:14:26 +08:00
Yonghua Li
682e1ab2d0
[Cherry-Pick] [BugFix] fix mtp cache attaching for pd disaggregation ( #5884 ) ( #5885 )
...
* [fix] fix mtp cache attaching for pd disaggregation
* [fix] fix port
2026-01-06 14:19:38 +08:00
Yonghua Li
f3ebd64446
[Cherry-Pick] [KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled ( #5871 ) ( #5859 )
...
* [fix] temporarily forbid cpu cache in update/clear api
* [fix] stop launching cache transfer manager unless hierarchical cache is enabled
2026-01-06 11:05:45 +08:00
freeliuzc
dcb0cceded
[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes ( #5738 ) ( #5793 )
...
* fix attn_mask_offset in mtp with multi-step and pd-split-mode
* fix xpu operater register
* update pmtp multi-step mtp strategy in d-split -mode
* add note
* fix xpu register
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2026-01-05 07:59:17 -08:00
GoldPancake
c9a806de02
fix speculate metrics bug ( #5875 )
2026-01-05 19:43:55 +08:00
tianhaodongbd
d624c5288b
[RL] Change 'model' to the instance variable 'tmp_model' ( #5873 )
2026-01-05 02:09:14 -08:00
Copilot
9de6ae375c
[Cherry-Pick][APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT( #5865 ) ( #5867 )
...
* Initial plan
* Cherry-pick PR #5865 : Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-05 09:45:19 +08:00
ddchenhao66
3e04e43812
[Cherry-Pick][XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL ( #5845 )
2026-01-04 11:35:21 +08:00
chen
9a7eb33fd4
[Cherry-Pick][Optimization] Optimization for gather_logprob by 10GB ( #5817 )( #5846 ) ( #5834 )
...
* [Optimization] Optimization for gather_logprob by 10GB (#5817 )
* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k
* only cuda run triton op (#5846 )
2025-12-31 19:54:14 +08:00
kevin
20024b889c
[Cherry-Pick][BugFix] cp skip_mm_revert( #5848 ) ( #5849 )
...
* cp skip_mm_revert
* update test
2025-12-31 17:29:49 +08:00
Yonghua Li
638009387d
[Cherry-Pick] [BugFix] fix cache manager not launched in case of mtp or blockwise fp8 ( #5840 ) ( #5841 )
...
* [BugFix] fix cache manager not launched in case of mtp or blockwise fp8
* [fix] fix mtp cache in mtp.py
* [fix] fix gpu ops import
* [fix] fix mtp layer idx
2025-12-31 15:08:34 +08:00
GoldPancake
f33e642327
[Cherry-Pick][Speculative Decoding] Optimize draft logprob ( #5842 ) ( #5843 )
...
* optimize draft logprob
* fix ut
2025-12-31 10:43:44 +08:00
kevin
a247260deb
eb5 mm skip prefix cache ( #5839 )
2025-12-30 05:31:21 -08:00
GoldPancake
0d29f6df03
[Cherry-Pick][BugFix] Fix entropy bugs ( #5818 ) ( #5819 )
...
* [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 )
* fix attn_mask_offset in mtp with multi-step and pd-split-mode
* fix xpu operater register
* update pmtp multi-step mtp strategy in d-split -mode
* add note
* fix xpu register
* fix entropy bugs
* Revert "[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 )"
This reverts commit ba0d35a52e8775300a1459bfcaa39056df570525.
* fix ut
* fix
---------
Co-authored-by: freeliuzc <lzc842650834@gmail.com >
2025-12-29 20:45:03 -08:00
tianhaodongbd
834502711a
[RL] add lm_head_fp32 in RolloutModelConfig ( #5824 )
2025-12-29 20:23:11 -08:00
Yuanle Liu
b2bd2595af
[Cherry-Pick][BugFix] Fix _disable_sequence_parallel_moe_if_needed#5740 ( #5811 )
2025-12-28 23:56:24 -08:00
kxz2002
df775c2811
[BugFix] Fix process_response_dict to support async in serving_completion ( #5758 ) ( #5802 )
...
* support process_response_dict async initial commit
* fixbug
* add unit test
* optimize
2025-12-29 09:56:42 +08:00
chenjian
c78c3be0d3
[BugFix] Fix preemption out of real_bsz ( #5806 )
2025-12-29 09:52:18 +08:00
kevin
c170fc4dc5
[FDConfig][Cherry-Pick] Cp disable mm chunked( #5774 ) ( #5775 )
...
* disable chunked_mm_input in ernie5
* cp_disable_mm_chunked
* update test case
* update code
2025-12-26 15:31:46 +08:00
kevin
9a8e2152b1
[BugFix][Cherry-Pick] cp fix logprob bug( #5604 ) ( #5770 )
2025-12-25 04:09:16 -08:00
周周周
d0c5bcec3d
[cherry-pick] support FA3 in mixed mode and support Qwen3 rope ( #5655 )
...
* [Others] Remove useless code (#5404 )
* FA3 support qwen3 (#5441 )
* commit
2025-12-25 11:11:16 +08:00
bukejiyu
7c62626e15
[Cherry-Pick][Loader]Fix bug in MTP weight loading #5744 ( #5745 )
...
* fix torch mtp
* fix
* update
2025-12-25 10:01:56 +08:00
bukejiyu
fc3bccc5b6
[Cherry-Pick][Others]upgrade paddleformer to 0.4.0 #5599 ( #5716 )
...
* update 0.4.0
* update
2025-12-24 06:28:50 -08:00
chenjian
6945f87634
[Bug fix] Set enable_cache_output as false by default ( #5752 )
2025-12-24 21:28:08 +08:00
GoldPancake
e51af01a65
[Cherry-Pick][Feature] Entropy calculation support #5692 ( #5731 )
...
* support entropy
* add script
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-24 15:42:43 +08:00
Yonghua Li
9ff99d2b03
[BugFix] fix double shutdown of comm group when rank0 clears weights slower than other ranks ( #5710 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-12-23 01:51:35 -08:00
freeliuzc
ceafd757f0
[Speculative Decoding]Support multi-step mtp with cudagraph ( #5624 ) ( #5670 )
...
* support multi-step mtp with cudagraph
* fix usage
* fix unit test
2025-12-23 13:18:47 +08:00
ddchenhao66
eb309e5a2a
[XPU]Set top_p=0.0 by default on XPU to optimize performance ( #5688 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-23 11:00:53 +08:00
Yuanle Liu
90065084cb
[BugFix] fix rl signal ( #5678 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-12-22 00:31:24 -08:00
Yonghua Li
ea16c82b43
[Cherry-Pick] [RL] provide options for whether shutdown comm group after weights cleared ( #5663 ) ( #5664 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [rl] provide options for whether shutdown comm group after weights cleared
* [fix] fix args hardcode
* [fix] change args type
* [fix] add worker process args
2025-12-19 23:18:03 +08:00
bukejiyu
dd0014b7b9
del core ( #5659 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-12-19 16:33:44 +08:00
kevin
e10c5d5d61
cp fix eb5 prefix cache bug ( #5644 )
2025-12-19 14:57:17 +08:00
qw86972190
a9bb24bb56
[XPU]logprob bug ( #5636 )
2025-12-19 14:30:14 +08:00
Yuanle Liu
b3f78815d8
update rl signal ( #5650 )
2025-12-18 20:04:18 -08:00
kevin
23bfd28624
[Cherry-Pick][BugFix] cp fix_cpu_cache_bugs( #5544 ) ( #5577 )
...
* cp fix_cpu_cache_bugs
* update ce case
* update test case
* update code
2025-12-19 11:48:50 +08:00
bukejiyu
2aa88d3621
[Cherry-Pick][RL]Fix RL load_weights #5642 ( #5643 )
2025-12-18 19:17:09 -08:00