copilot-swe-agent[bot]
46e14f88f9
Merge origin/release/2.6 and resolve worker_process conflict
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-04-16 11:01:28 +00:00
YuBaoku
72ce56b10b
[BugFix] fix tool call parser ( #7369 ) ( #7419 )
...
* fix tool call parser
* add unit test
* fix unit test
* add unit test
Co-authored-by: luukunn <981429396@qq.com >
2026-04-16 17:15:03 +08:00
jc
b8e8a6253f
PD deployment support without router ( #7412 ) ( #7424 )
2026-04-16 14:02:10 +08:00
GoldPancake
26674bbbb6
[Cherry-Pick][RL] Add clear_graph_opt_backend for glm4_mtp ( #7378 ) ( #7379 )
...
* add clear_grpah func
* fix spell
2026-04-15 19:45:09 +08:00
chen
2ee1cc3d0a
check init_flash_attn_version log ( #7401 )
2026-04-15 11:05:20 +08:00
sunxin
5f7524eb85
fix rl moe gate type ( #7394 )
2026-04-14 20:04:09 +08:00
freeliuzc
f6c066fb9d
Revert "[Optimization] Optimize ttft for prefill pd ( #6680 )" ( #7386 )
...
* Revert "[Optimization] Optimize ttft for prefill pd (#6680 )"
This reverts commit 6727df8286 .
* fix revert pr
2026-04-14 20:01:39 +08:00
YuBaoku
8a8beca548
[BugFix][PD Disaggregation][KVCache] Fix low cache hit rate in PD split scenario ( #7364 ) ( #7387 )
...
## Motivation
在 PD 分离场景下,decode 节点在接收 prefill 节点转发的请求后,没有及时更新 cache block 的命中信息,
导致 prefix cache 命中率低,影响推理性能。
## Modifications
1. 在 `_free_blocks_when_stop` 方法中,额外排除 prefill 节点(`splitwise_role == "prefill"`)
的 cache block 更新,避免 prefill 节点重复更新 cache 导致状态混乱。
2. 在 decode 节点分配请求(`_alloc_requests_with_cache`)成功后,主动调用
`update_cache_blocks` 使用 `need_prefill_tokens` 更新 cache block 信息,
确保 decode 节点能正确感知已命中的 prefix cache。
Co-authored-by: kevin <chengyf112@gmail.com >
2026-04-14 19:25:12 +08:00
chenjian
d9a008f3c8
[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 ( #7159 ) ( #7351 )
...
* [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1
* [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1
* fix
2026-04-13 15:24:01 +08:00
sunxin
b2997f3aad
fix overlap mtp empty run ( #7314 )
2026-04-13 15:20:11 +08:00
liuruyan
9cb82d79a0
[Cherry-Pick][TI-consistent] support quant use pow2scale( #7308 ) ( #7310 )
...
* support quant use pow2scale
* fix
* fix
2026-04-13 00:02:08 -07:00
Jiang-Jia-Jun
6ee354f2c8
Update worker_process.py
2026-04-12 06:03:21 +00:00
Jiang-Jia-Jun
19b3b203d5
Update envs.py
2026-04-12 06:03:21 +00:00
jiang-jia-jun
63eaccd6c2
[Optim] Remove IPCLock between CacheManager and WorkerProcess
2026-04-12 06:03:21 +00:00
chen
7446665676
[Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm( #7337 ) ( #7339 )
...
* moe bf16 ep support paddle batch_gemm
2026-04-11 21:51:26 +08:00
JYChen
42b0f59b9e
[Cherry-Pick][RL] change glm rope_emb calculation #7316 ( #7318 )
...
* change glm rope_emb calculation
* glm without EnforceFmulRN
* fix ci
2026-04-11 18:38:37 +08:00
GoldPancake
c7560383ab
[Cherry-Pick][FDConfig] Auto-scale CUDA Graph Capture & CLI Quantization Params + CUDAGraph Validation (#7215,#7281) ( #7301 )
...
* refactor cudagraph args
* refactor quant cli param
* fix
* fix
* tmp skip xpu
* fix
2026-04-10 16:10:31 +08:00
zhangbo9674
4f36346e14
[Cherry-Pick] change rms norm for glm #7269 ( #7276 )
...
* fix
* refine code
* refine code
* refine code
* refine code
* refine code
2026-04-10 01:03:00 -07:00
fxyfxy777
dea9d35171
[OP]Unify MoE op with moe_permute path for bf16 GLM ( #7164 ) ( #7279 )
2026-04-09 21:37:42 +08:00
Bingoo
849eb3df65
[Cherry-Pick][Optimization] merge matmul and add (#6986) ( #7191 )
...
* merge matmul and add
* modify format
* using paddle.nn.functional.linear
* using _C_ops.linear
* using paddle.nn.functional.linear
* add FLAGS_use_legacy_linear env var in test case
* fix format
* add assert and remove env
* modify format
* using matmul for no bias
* modify accurate baseline
2026-04-09 14:15:43 +08:00
xiaoxiaohehe001
5fd8020363
[Cherry-Pick][BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn ( #7216 )
2026-04-09 11:05:43 +08:00
JYChen
9c65655cb3
[Cherry-Pick][RL] support moe-topk use topk_reduce_func #7218 ( #7256 )
...
* support moe-topk use topk_reduce_func
* fix ep error
* fix ut
* fix ut
2026-04-09 11:01:10 +08:00
YuBaoku
6b78981dde
Split enable_mm ( #7183 ) ( #7233 )
...
Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com >
Co-authored-by: liuruian <liuruian@MacBook-Pro.local >
2026-04-08 16:32:04 +08:00
GoldPancake
403ce139c7
remove arctic_inference deps ( #7236 )
2026-04-08 15:25:21 +08:00
huicongyao
36909bf27d
[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap( #7172 ) ( #7192 )
...
* fix MTP bugs in TP and overlap
* fix
2026-04-08 10:24:38 +08:00
Yonghua Li
55dbc83310
[Cherry-Pick][BugFix] prevent requests from entering running state without a slot( #7141 ) ( #7181 )
...
* [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (#7163 )
* Set MC_MAX_MR_SIZE to avoid register hang
* up
* [fix] prevent requests from entering running state without a slot
* [fix] count abort set
* [fix] count preempted task in waiting list
---------
Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com >
2026-04-03 17:46:13 +08:00
jackyYang6
e3aed6de2f
fix oom bug, optimize async weight loading and update read step by yaml ( #7171 )
2026-04-03 11:05:24 +08:00
jc
1cc0cf23c2
[BugFix] Set MC_MAX_MR_SIZE to avoid register hang in default ( #7161 )
...
* Set MC_MAX_MR_SIZE to avoid register hang
* Set MC_MAX_MR_SIZE to avoid register hang
2026-04-03 10:51:15 +08:00
chenjian
2632e6cf32
[Feature] Support chunk prefill disabled in scheduler v1 ( #7152 )
2026-04-03 10:18:14 +08:00
luukunn
562fa31791
[BugFix]fix extract_tool_calls ( #7154 )
...
* fix extract_tool_calls
2026-04-02 21:18:37 +08:00
Yonghua Li
98f3fc9267
[RL] [KVCache] let cache transfer managers update key prefix after weight update and add unit tests ( #7083 )
...
* [test] add a few unit tests
* [feat] update key prefix when model weights are updated
* [test] try to fix test_worker_process
2026-04-02 19:58:41 +08:00
fxyfxy777
9f3b3ce7f5
[Optimization] merge_allreduce ( #7039 )
2026-04-02 19:52:13 +08:00
Longzhi Wang
938e7dd881
[Other] support video_fps args for video bench ( #7077 )
2026-04-02 10:40:15 +08:00
luukunn
fa7a84926d
[Optimization]Fix tool parser ( #7079 )
...
* fix tool parser
2026-04-01 21:20:34 +08:00
Bingoo
410988d9ec
[OP] support deepgeem for sm103 ( #7073 )
...
* support deepgeem for sm103
* add assert
* modify code style
* add assert
* modify sm version condition
* remove assert
2026-04-01 21:01:09 +08:00
cmcamdy
7a2e33098f
[XPU] Refactor pre process ( #6993 )
...
* [XPU] support speculate_pre_process
* merge develop
* fix codestype
* fix mtp, support cu_seqlens_q_output
* fix mtp, support cu_seqlens_q_output
* fix test
---------
Co-authored-by: lizan1999 <lizan03@baidu.com >
2026-04-01 20:29:55 +08:00
mouxin
fba8a51ad1
[Feature] Fix mixed cache-aware ( #7129 )
...
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Fix mixed cache-aware
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-04-01 19:29:29 +08:00
yzwu
ceaf5df350
[Iluvatar] Fix cuda graph error for tp > 1 in ernie models ( #7126 )
2026-04-01 19:13:34 +08:00
mouxin
6cae9b1f50
[Feature] Config eviction_duration ( #7125 )
...
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-04-01 16:46:21 +08:00
sunxin
c29e86fc9d
[Feature] Support mtp overlap schedule ( #7001 )
2026-04-01 14:24:26 +08:00
zhouchong
91c832f607
[Feature] Add logging parameters and error output to terminal ( #7098 )
2026-04-01 13:18:42 +08:00
jc
af51fc46d6
[PD Disaggregation] Write the cache of preempted req to storage and refine PD Disaggregation ( #7107 )
...
* Write the cache of preempted req to storage
* up
* fix
2026-04-01 13:15:52 +08:00
luukunn
3651113ee5
[DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR ( #7052 )
...
* remove ENABLE_V1_DATA_PROCESSOR
* fix unit test
* fix unit test
2026-04-01 09:53:41 +08:00
qwes5s5
ee2b965f5f
adjust config info ( #7054 )
2026-03-31 21:26:05 +08:00
Yonghua Li
a3cc3aa777
[BugFix] reset exist tasks signal in clear_data ( #7111 )
...
* [BugFix] reset exist tasks signal in clear_data
* [Fix] fix stale exist tasks signal after weight update
* [Chore] downgrade detected new requests log to DEBUG level
* [fix] adjust continue place
2026-03-31 21:24:08 +08:00
YilongGuo
dd61e7e421
[Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditionalGeneration ( #7086 )
...
Add clear_grpah_opt_backend method that delegates to the underlying model
to clear cuda graph optimization backend.
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-03-31 13:48:25 +08:00
qwes5s5
daa95244f7
abort requests ( #6992 )
2026-03-31 11:02:26 +08:00
Yonghua Li
6d9739f360
[BugFix] fix speculative gauge metrics in multi api server ( #7082 )
2026-03-31 10:52:50 +08:00
chenjian
6727df8286
[Optimization] Optimize ttft for prefill pd ( #6680 )
...
* optimize ttft
* fix
* fix
* fix ci
* fix ci
* fix
* fix bug
* fix
* add comments
* fix ci
* fix
* fix ci
* fix format
* update according to review
* add comment
* fix
* fix format
2026-03-30 20:36:23 +08:00
jackyYang6
05f2d95729
[RL] Adapt async rollout checkpoint update flow ( #7042 )
...
* update checkpoint-transfer flow and control update_weights params
* test: add update_weights route validation
2026-03-30 19:19:34 +08:00