FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
copilot-swe-agent[bot]	46e14f88f9	Merge origin/release/2.6 and resolve worker_process conflict Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-04-16 11:01:28 +00:00
YuBaoku	72ce56b10b	[BugFix] fix tool call parser (#7369 ) (#7419 ) * fix tool call parser * add unit test * fix unit test * add unit test Co-authored-by: luukunn <981429396@qq.com>	2026-04-16 17:15:03 +08:00
jc	b8e8a6253f	PD deployment support without router (#7412 ) (#7424 )	2026-04-16 14:02:10 +08:00
GoldPancake	26674bbbb6	[Cherry-Pick][RL] Add clear_graph_opt_backend for glm4_mtp (#7378 ) (#7379 ) * add clear_grpah func * fix spell	2026-04-15 19:45:09 +08:00
chen	2ee1cc3d0a	check init_flash_attn_version log (#7401 )	2026-04-15 11:05:20 +08:00
sunxin	5f7524eb85	fix rl moe gate type (#7394 )	2026-04-14 20:04:09 +08:00
freeliuzc	f6c066fb9d	Revert "[Optimization] Optimize ttft for prefill pd (#6680 )" (#7386 ) * Revert "[Optimization] Optimize ttft for prefill pd (#6680)" This reverts commit `6727df8286`. * fix revert pr	2026-04-14 20:01:39 +08:00
YuBaoku	8a8beca548	[BugFix][PD Disaggregation][KVCache] Fix low cache hit rate in PD split scenario (#7364 ) (#7387 ) ## Motivation 在 PD 分离场景下，decode 节点在接收 prefill 节点转发的请求后，没有及时更新 cache block 的命中信息，导致 prefix cache 命中率低，影响推理性能。 ## Modifications 1. 在 `_free_blocks_when_stop` 方法中，额外排除 prefill 节点（`splitwise_role == "prefill"`）的 cache block 更新，避免 prefill 节点重复更新 cache 导致状态混乱。 2. 在 decode 节点分配请求（`_alloc_requests_with_cache`）成功后，主动调用 `update_cache_blocks` 使用 `need_prefill_tokens` 更新 cache block 信息，确保 decode 节点能正确感知已命中的 prefix cache。 Co-authored-by: kevin <chengyf112@gmail.com>	2026-04-14 19:25:12 +08:00
chenjian	d9a008f3c8	[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 (#7159 ) (#7351 ) * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * fix	2026-04-13 15:24:01 +08:00
sunxin	b2997f3aad	fix overlap mtp empty run (#7314 )	2026-04-13 15:20:11 +08:00
liuruyan	9cb82d79a0	[Cherry-Pick][TI-consistent] support quant use pow2scale(#7308 ) (#7310 ) * support quant use pow2scale * fix * fix	2026-04-13 00:02:08 -07:00
Jiang-Jia-Jun	6ee354f2c8	Update worker_process.py	2026-04-12 06:03:21 +00:00
Jiang-Jia-Jun	19b3b203d5	Update envs.py	2026-04-12 06:03:21 +00:00
jiang-jia-jun	63eaccd6c2	[Optim] Remove IPCLock between CacheManager and WorkerProcess	2026-04-12 06:03:21 +00:00
chen	7446665676	[Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm(#7337 ) (#7339 ) * moe bf16 ep support paddle batch_gemm	2026-04-11 21:51:26 +08:00
JYChen	42b0f59b9e	[Cherry-Pick][RL] change glm rope_emb calculation #7316 (#7318 ) * change glm rope_emb calculation * glm without EnforceFmulRN * fix ci	2026-04-11 18:38:37 +08:00
GoldPancake	c7560383ab	[Cherry-Pick][FDConfig] Auto-scale CUDA Graph Capture & CLI Quantization Params + CUDAGraph Validation (#7215,#7281) (#7301 ) * refactor cudagraph args * refactor quant cli param * fix * fix * tmp skip xpu * fix	2026-04-10 16:10:31 +08:00
zhangbo9674	4f36346e14	[Cherry-Pick] change rms norm for glm #7269 (#7276 ) * fix * refine code * refine code * refine code * refine code * refine code	2026-04-10 01:03:00 -07:00
fxyfxy777	dea9d35171	[OP]Unify MoE op with moe_permute path for bf16 GLM (#7164 ) (#7279 )	2026-04-09 21:37:42 +08:00
Bingoo	849eb3df65	[Cherry-Pick][Optimization] merge matmul and add （#6986） (#7191 ) * merge matmul and add * modify format * using paddle.nn.functional.linear * using _C_ops.linear * using paddle.nn.functional.linear * add FLAGS_use_legacy_linear env var in test case * fix format * add assert and remove env * modify format * using matmul for no bias * modify accurate baseline	2026-04-09 14:15:43 +08:00
xiaoxiaohehe001	5fd8020363	[Cherry-Pick][BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn (#7216 )	2026-04-09 11:05:43 +08:00
JYChen	9c65655cb3	[Cherry-Pick][RL] support moe-topk use topk_reduce_func #7218 (#7256 ) * support moe-topk use topk_reduce_func * fix ep error * fix ut * fix ut	2026-04-09 11:01:10 +08:00
YuBaoku	6b78981dde	Split enable_mm (#7183 ) (#7233 ) Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com> Co-authored-by: liuruian <liuruian@MacBook-Pro.local>	2026-04-08 16:32:04 +08:00
GoldPancake	403ce139c7	remove arctic_inference deps (#7236 )	2026-04-08 15:25:21 +08:00
huicongyao	36909bf27d	[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172 ) (#7192 ) * fix MTP bugs in TP and overlap * fix	2026-04-08 10:24:38 +08:00
Yonghua Li	55dbc83310	[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141 ) (#7181 ) * [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (#7163) * Set MC_MAX_MR_SIZE to avoid register hang * up * [fix] prevent requests from entering running state without a slot * [fix] count abort set * [fix] count preempted task in waiting list --------- Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com>	2026-04-03 17:46:13 +08:00
jackyYang6	e3aed6de2f	fix oom bug, optimize async weight loading and update read step by yaml (#7171 )	2026-04-03 11:05:24 +08:00
jc	1cc0cf23c2	[BugFix] Set MC_MAX_MR_SIZE to avoid register hang in default (#7161 ) * Set MC_MAX_MR_SIZE to avoid register hang * Set MC_MAX_MR_SIZE to avoid register hang	2026-04-03 10:51:15 +08:00
chenjian	2632e6cf32	[Feature] Support chunk prefill disabled in scheduler v1 (#7152 )	2026-04-03 10:18:14 +08:00
luukunn	562fa31791	[BugFix]fix extract_tool_calls (#7154 ) * fix extract_tool_calls	2026-04-02 21:18:37 +08:00
Yonghua Li	98f3fc9267	[RL] [KVCache] let cache transfer managers update key prefix after weight update and add unit tests (#7083 ) * [test] add a few unit tests * [feat] update key prefix when model weights are updated * [test] try to fix test_worker_process	2026-04-02 19:58:41 +08:00
fxyfxy777	9f3b3ce7f5	[Optimization] merge_allreduce (#7039 )	2026-04-02 19:52:13 +08:00
Longzhi Wang	938e7dd881	[Other] support video_fps args for video bench (#7077 )	2026-04-02 10:40:15 +08:00
luukunn	fa7a84926d	[Optimization]Fix tool parser (#7079 ) * fix tool parser	2026-04-01 21:20:34 +08:00
Bingoo	410988d9ec	[OP] support deepgeem for sm103 (#7073 ) * support deepgeem for sm103 * add assert * modify code style * add assert * modify sm version condition * remove assert	2026-04-01 21:01:09 +08:00
cmcamdy	7a2e33098f	[XPU] Refactor pre process (#6993 ) * [XPU] support speculate_pre_process * merge develop * fix codestype * fix mtp, support cu_seqlens_q_output * fix mtp, support cu_seqlens_q_output * fix test --------- Co-authored-by: lizan1999 <lizan03@baidu.com>	2026-04-01 20:29:55 +08:00
mouxin	fba8a51ad1	[Feature] Fix mixed cache-aware (#7129 ) * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Fix mixed cache-aware --------- Co-authored-by: mouxin <mouxin@baidu.com>	2026-04-01 19:29:29 +08:00
yzwu	ceaf5df350	[Iluvatar] Fix cuda graph error for tp > 1 in ernie models (#7126 )	2026-04-01 19:13:34 +08:00
mouxin	6cae9b1f50	[Feature] Config eviction_duration (#7125 ) * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration --------- Co-authored-by: mouxin <mouxin@baidu.com>	2026-04-01 16:46:21 +08:00
sunxin	c29e86fc9d	[Feature] Support mtp overlap schedule (#7001 )	2026-04-01 14:24:26 +08:00
zhouchong	91c832f607	[Feature] Add logging parameters and error output to terminal (#7098 )	2026-04-01 13:18:42 +08:00
jc	af51fc46d6	[PD Disaggregation] Write the cache of preempted req to storage and refine PD Disaggregation (#7107 ) * Write the cache of preempted req to storage * up * fix	2026-04-01 13:15:52 +08:00
luukunn	3651113ee5	[DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052 ) * remove ENABLE_V1_DATA_PROCESSOR * fix unit test * fix unit test	2026-04-01 09:53:41 +08:00
qwes5s5	ee2b965f5f	adjust config info (#7054 )	2026-03-31 21:26:05 +08:00
Yonghua Li	a3cc3aa777	[BugFix] reset exist tasks signal in clear_data (#7111 ) * [BugFix] reset exist tasks signal in clear_data * [Fix] fix stale exist tasks signal after weight update * [Chore] downgrade detected new requests log to DEBUG level * [fix] adjust continue place	2026-03-31 21:24:08 +08:00
YilongGuo	dd61e7e421	[Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditionalGeneration (#7086 ) Add clear_grpah_opt_backend method that delegates to the underlying model to clear cuda graph optimization backend. Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-03-31 13:48:25 +08:00
qwes5s5	daa95244f7	abort requests (#6992 )	2026-03-31 11:02:26 +08:00
Yonghua Li	6d9739f360	[BugFix] fix speculative gauge metrics in multi api server (#7082 )	2026-03-31 10:52:50 +08:00
chenjian	6727df8286	[Optimization] Optimize ttft for prefill pd (#6680 ) * optimize ttft * fix * fix * fix ci * fix ci * fix * fix bug * fix * add comments * fix ci * fix * fix ci * fix format * update according to review * add comment * fix * fix format	2026-03-30 20:36:23 +08:00
jackyYang6	05f2d95729	[RL] Adapt async rollout checkpoint update flow (#7042 ) * update checkpoint-transfer flow and control update_weights params * test: add update_weights route validation	2026-03-30 19:19:34 +08:00

1 2 3 4 5 ...

1978 Commits