Commit Graph

1978 Commits

Author SHA1 Message Date
copilot-swe-agent[bot] 46e14f88f9 Merge origin/release/2.6 and resolve worker_process conflict
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-04-16 11:01:28 +00:00
YuBaoku 72ce56b10b [BugFix] fix tool call parser (#7369) (#7419)
* fix tool call parser

* add unit test

* fix unit test

* add unit test

Co-authored-by: luukunn <981429396@qq.com>
2026-04-16 17:15:03 +08:00
jc b8e8a6253f PD deployment support without router (#7412) (#7424) 2026-04-16 14:02:10 +08:00
GoldPancake 26674bbbb6 [Cherry-Pick][RL] Add clear_graph_opt_backend for glm4_mtp (#7378) (#7379)
* add clear_grpah func

* fix spell
2026-04-15 19:45:09 +08:00
chen 2ee1cc3d0a check init_flash_attn_version log (#7401) 2026-04-15 11:05:20 +08:00
sunxin 5f7524eb85 fix rl moe gate type (#7394) 2026-04-14 20:04:09 +08:00
freeliuzc f6c066fb9d Revert "[Optimization] Optimize ttft for prefill pd (#6680)" (#7386)
* Revert "[Optimization] Optimize ttft for prefill pd (#6680)"

This reverts commit 6727df8286.

* fix revert pr
2026-04-14 20:01:39 +08:00
YuBaoku 8a8beca548 [BugFix][PD Disaggregation][KVCache] Fix low cache hit rate in PD split scenario (#7364) (#7387)
## Motivation

在 PD 分离场景下,decode 节点在接收 prefill 节点转发的请求后,没有及时更新 cache block 的命中信息,
导致 prefix cache 命中率低,影响推理性能。

## Modifications

1. 在 `_free_blocks_when_stop` 方法中,额外排除 prefill 节点(`splitwise_role == "prefill"`)
   的 cache block 更新,避免 prefill 节点重复更新 cache 导致状态混乱。
2. 在 decode 节点分配请求(`_alloc_requests_with_cache`)成功后,主动调用
   `update_cache_blocks` 使用 `need_prefill_tokens` 更新 cache block 信息,
   确保 decode 节点能正确感知已命中的 prefix cache。

Co-authored-by: kevin <chengyf112@gmail.com>
2026-04-14 19:25:12 +08:00
chenjian d9a008f3c8 [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 (#7159) (#7351)
* [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1

* [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1

* fix
2026-04-13 15:24:01 +08:00
sunxin b2997f3aad fix overlap mtp empty run (#7314) 2026-04-13 15:20:11 +08:00
liuruyan 9cb82d79a0 [Cherry-Pick][TI-consistent] support quant use pow2scale(#7308) (#7310)
* support quant use pow2scale

* fix

* fix
2026-04-13 00:02:08 -07:00
Jiang-Jia-Jun 6ee354f2c8 Update worker_process.py 2026-04-12 06:03:21 +00:00
Jiang-Jia-Jun 19b3b203d5 Update envs.py 2026-04-12 06:03:21 +00:00
jiang-jia-jun 63eaccd6c2 [Optim] Remove IPCLock between CacheManager and WorkerProcess 2026-04-12 06:03:21 +00:00
chen 7446665676 [Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm(#7337) (#7339)
* moe bf16 ep support paddle batch_gemm
2026-04-11 21:51:26 +08:00
JYChen 42b0f59b9e [Cherry-Pick][RL] change glm rope_emb calculation #7316 (#7318)
* change glm rope_emb calculation

* glm without EnforceFmulRN

* fix ci
2026-04-11 18:38:37 +08:00
GoldPancake c7560383ab [Cherry-Pick][FDConfig] Auto-scale CUDA Graph Capture & CLI Quantization Params + CUDAGraph Validation (#7215,#7281) (#7301)
* refactor cudagraph args

* refactor quant cli param

* fix

* fix

* tmp skip xpu

* fix
2026-04-10 16:10:31 +08:00
zhangbo9674 4f36346e14 [Cherry-Pick] change rms norm for glm #7269 (#7276)
* fix

* refine code

* refine code

* refine code

* refine code

* refine code
2026-04-10 01:03:00 -07:00
fxyfxy777 dea9d35171 [OP]Unify MoE op with moe_permute path for bf16 GLM (#7164) (#7279) 2026-04-09 21:37:42 +08:00
Bingoo 849eb3df65 [Cherry-Pick][Optimization] merge matmul and add (#6986) (#7191)
* merge matmul and add

* modify format

* using paddle.nn.functional.linear

* using _C_ops.linear

* using paddle.nn.functional.linear

* add FLAGS_use_legacy_linear env var in test case

* fix format

* add assert and remove env

* modify format

* using matmul for no bias

* modify accurate baseline
2026-04-09 14:15:43 +08:00
xiaoxiaohehe001 5fd8020363 [Cherry-Pick][BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn (#7216) 2026-04-09 11:05:43 +08:00
JYChen 9c65655cb3 [Cherry-Pick][RL] support moe-topk use topk_reduce_func #7218 (#7256)
* support moe-topk use topk_reduce_func

* fix ep error

* fix ut

* fix ut
2026-04-09 11:01:10 +08:00
YuBaoku 6b78981dde Split enable_mm (#7183) (#7233)
Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com>
Co-authored-by: liuruian <liuruian@MacBook-Pro.local>
2026-04-08 16:32:04 +08:00
GoldPancake 403ce139c7 remove arctic_inference deps (#7236) 2026-04-08 15:25:21 +08:00
huicongyao 36909bf27d [Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172) (#7192)
* fix MTP bugs in TP and overlap

* fix
2026-04-08 10:24:38 +08:00
Yonghua Li 55dbc83310 [Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141) (#7181)
* [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (#7163)

* Set MC_MAX_MR_SIZE to avoid register hang

* up

* [fix] prevent requests from entering running state without a slot

* [fix] count abort set

* [fix] count preempted task in waiting list

---------

Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com>
2026-04-03 17:46:13 +08:00
jackyYang6 e3aed6de2f fix oom bug, optimize async weight loading and update read step by yaml (#7171) 2026-04-03 11:05:24 +08:00
jc 1cc0cf23c2 [BugFix] Set MC_MAX_MR_SIZE to avoid register hang in default (#7161)
* Set MC_MAX_MR_SIZE to avoid register hang

* Set MC_MAX_MR_SIZE to avoid register hang
2026-04-03 10:51:15 +08:00
chenjian 2632e6cf32 [Feature] Support chunk prefill disabled in scheduler v1 (#7152) 2026-04-03 10:18:14 +08:00
luukunn 562fa31791 [BugFix]fix extract_tool_calls (#7154)
* fix extract_tool_calls
2026-04-02 21:18:37 +08:00
Yonghua Li 98f3fc9267 [RL] [KVCache] let cache transfer managers update key prefix after weight update and add unit tests (#7083)
* [test] add a few unit tests

* [feat] update key prefix when model weights are updated

* [test] try to fix test_worker_process
2026-04-02 19:58:41 +08:00
fxyfxy777 9f3b3ce7f5 [Optimization] merge_allreduce (#7039) 2026-04-02 19:52:13 +08:00
Longzhi Wang 938e7dd881 [Other] support video_fps args for video bench (#7077) 2026-04-02 10:40:15 +08:00
luukunn fa7a84926d [Optimization]Fix tool parser (#7079)
* fix tool parser
2026-04-01 21:20:34 +08:00
Bingoo 410988d9ec [OP] support deepgeem for sm103 (#7073)
* support deepgeem for sm103

* add assert

* modify code style

* add assert

* modify sm version condition

* remove assert
2026-04-01 21:01:09 +08:00
cmcamdy 7a2e33098f [XPU] Refactor pre process (#6993)
* [XPU] support speculate_pre_process

* merge develop

* fix codestype

* fix mtp, support cu_seqlens_q_output

* fix mtp, support cu_seqlens_q_output

* fix test

---------

Co-authored-by: lizan1999 <lizan03@baidu.com>
2026-04-01 20:29:55 +08:00
mouxin fba8a51ad1 [Feature] Fix mixed cache-aware (#7129)
* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Fix mixed cache-aware

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-04-01 19:29:29 +08:00
yzwu ceaf5df350 [Iluvatar] Fix cuda graph error for tp > 1 in ernie models (#7126) 2026-04-01 19:13:34 +08:00
mouxin 6cae9b1f50 [Feature] Config eviction_duration (#7125)
* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-04-01 16:46:21 +08:00
sunxin c29e86fc9d [Feature] Support mtp overlap schedule (#7001) 2026-04-01 14:24:26 +08:00
zhouchong 91c832f607 [Feature] Add logging parameters and error output to terminal (#7098) 2026-04-01 13:18:42 +08:00
jc af51fc46d6 [PD Disaggregation] Write the cache of preempted req to storage and refine PD Disaggregation (#7107)
* Write the cache of preempted req to storage

* up

* fix
2026-04-01 13:15:52 +08:00
luukunn 3651113ee5 [DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052)
* remove ENABLE_V1_DATA_PROCESSOR

* fix unit test

* fix unit test
2026-04-01 09:53:41 +08:00
qwes5s5 ee2b965f5f adjust config info (#7054) 2026-03-31 21:26:05 +08:00
Yonghua Li a3cc3aa777 [BugFix] reset exist tasks signal in clear_data (#7111)
* [BugFix] reset exist tasks signal in clear_data

* [Fix] fix stale exist tasks signal after weight update

* [Chore] downgrade detected new requests log to DEBUG level

* [fix] adjust continue place
2026-03-31 21:24:08 +08:00
YilongGuo dd61e7e421 [Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditionalGeneration (#7086)
Add clear_grpah_opt_backend method that delegates to the underlying model
to clear cuda graph optimization backend.

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-03-31 13:48:25 +08:00
qwes5s5 daa95244f7 abort requests (#6992) 2026-03-31 11:02:26 +08:00
Yonghua Li 6d9739f360 [BugFix] fix speculative gauge metrics in multi api server (#7082) 2026-03-31 10:52:50 +08:00
chenjian 6727df8286 [Optimization] Optimize ttft for prefill pd (#6680)
* optimize ttft

* fix

* fix

* fix ci

* fix ci

* fix

* fix bug

* fix

* add comments

* fix ci

* fix

* fix ci

* fix format

* update according to review

* add comment

* fix

* fix format
2026-03-30 20:36:23 +08:00
jackyYang6 05f2d95729 [RL] Adapt async rollout checkpoint update flow (#7042)
* update checkpoint-transfer flow and control update_weights params

* test: add update_weights route validation
2026-03-30 19:19:34 +08:00