FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
copilot-swe-agent[bot]	46e14f88f9	Merge origin/release/2.6 and resolve worker_process conflict Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-04-16 11:01:28 +00:00
YuBaoku	72ce56b10b	[BugFix] fix tool call parser (#7369 ) (#7419 ) * fix tool call parser * add unit test * fix unit test * add unit test Co-authored-by: luukunn <981429396@qq.com>	2026-04-16 17:15:03 +08:00
jc	b8e8a6253f	PD deployment support without router (#7412 ) (#7424 )	2026-04-16 14:02:10 +08:00
GoldPancake	26674bbbb6	[Cherry-Pick][RL] Add clear_graph_opt_backend for glm4_mtp (#7378 ) (#7379 ) * add clear_grpah func * fix spell	2026-04-15 19:45:09 +08:00
Bingoo	61bfe6e5b3	modify flashmask version (#7414 )	2026-04-15 18:19:21 +08:00
chen	2ee1cc3d0a	check init_flash_attn_version log (#7401 )	2026-04-15 11:05:20 +08:00
sunxin	5f7524eb85	fix rl moe gate type (#7394 )	2026-04-14 20:04:09 +08:00
freeliuzc	f6c066fb9d	Revert "[Optimization] Optimize ttft for prefill pd (#6680 )" (#7386 ) * Revert "[Optimization] Optimize ttft for prefill pd (#6680)" This reverts commit `6727df8286`. * fix revert pr	2026-04-14 20:01:39 +08:00
YuBaoku	8a8beca548	[BugFix][PD Disaggregation][KVCache] Fix low cache hit rate in PD split scenario (#7364 ) (#7387 ) ## Motivation 在 PD 分离场景下，decode 节点在接收 prefill 节点转发的请求后，没有及时更新 cache block 的命中信息，导致 prefix cache 命中率低，影响推理性能。 ## Modifications 1. 在 `_free_blocks_when_stop` 方法中，额外排除 prefill 节点（`splitwise_role == "prefill"`）的 cache block 更新，避免 prefill 节点重复更新 cache 导致状态混乱。 2. 在 decode 节点分配请求（`_alloc_requests_with_cache`）成功后，主动调用 `update_cache_blocks` 使用 `need_prefill_tokens` 更新 cache block 信息，确保 decode 节点能正确感知已命中的 prefix cache。 Co-authored-by: kevin <chengyf112@gmail.com>	2026-04-14 19:25:12 +08:00
lonelygsh	e7c8dc2fe9	[Speculate Decoding] Fix step_idx semantics in limit_thinking and set_stop_value kernels (#7370 ) - speculate_limit_thinking_content_length: update current_base_step to step_idx+1 (step_idx now records history count before current round); remove incorrect step_idx decrement on accept_num truncation; mark step_idx param as const. - speculate_set_stop_value_multi_seqs: fix can_stop gate to use step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx formula (remove stale -accept_num offset); use <= condition so accept_idx maps directly to the accepted token that ends the stop sequence; fix accept_tokens index (remove -1). - Update unit tests for speculate_set_stop_value_multi_seqs kernel.	2026-04-14 12:54:22 +08:00
chen	144dc17b14	update attn_mask_q 2 (#7373 )	2026-04-13 23:06:16 +08:00
JYChen	9823d63220	remove fa4 requirements (#7354 )	2026-04-13 19:24:24 +08:00
chenjian	d9a008f3c8	[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 (#7159 ) (#7351 ) * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * fix	2026-04-13 15:24:01 +08:00
sunxin	b2997f3aad	fix overlap mtp empty run (#7314 )	2026-04-13 15:20:11 +08:00
liuruyan	9cb82d79a0	[Cherry-Pick][TI-consistent] support quant use pow2scale(#7308 ) (#7310 ) * support quant use pow2scale * fix * fix	2026-04-13 00:02:08 -07:00
Jiang-Jia-Jun	6ee354f2c8	Update worker_process.py	2026-04-12 06:03:21 +00:00
Jiang-Jia-Jun	19b3b203d5	Update envs.py	2026-04-12 06:03:21 +00:00
jiang-jia-jun	63eaccd6c2	[Optim] Remove IPCLock between CacheManager and WorkerProcess	2026-04-12 06:03:21 +00:00
YuBaoku	9e8ea7db14	[Cherry-Pick][CI] Sync dev optimizations to 2.6(#7335 ) (#7343 )	2026-04-12 13:22:52 +08:00
chen	7446665676	[Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm(#7337 ) (#7339 ) * moe bf16 ep support paddle batch_gemm	2026-04-11 21:51:26 +08:00
JYChen	42b0f59b9e	[Cherry-Pick][RL] change glm rope_emb calculation #7316 (#7318 ) * change glm rope_emb calculation * glm without EnforceFmulRN * fix ci	2026-04-11 18:38:37 +08:00
YuBaoku	65c6e726f5	[Cherry-Pick][Docs] Update Release Note(#7302 ) (#7341 )	2026-04-11 16:48:06 +08:00
YuBaoku	2ac9b89409	[XPU][CI]Update xtdk version in download_dependencies.sh (#7320 ) (#7322 ) Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-04-11 00:27:54 +08:00
GoldPancake	c7560383ab	[Cherry-Pick][FDConfig] Auto-scale CUDA Graph Capture & CLI Quantization Params + CUDAGraph Validation (#7215,#7281) (#7301 ) * refactor cudagraph args * refactor quant cli param * fix * fix * tmp skip xpu * fix	2026-04-10 16:10:31 +08:00
zhangbo9674	4f36346e14	[Cherry-Pick] change rms norm for glm #7269 (#7276 ) * fix * refine code * refine code * refine code * refine code * refine code	2026-04-10 01:03:00 -07:00
YuBaoku	dd0863b076	[BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bound bug (#7221 ) (#7296 ) Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>	2026-04-10 13:54:02 +08:00
fxyfxy777	dea9d35171	[OP]Unify MoE op with moe_permute path for bf16 GLM (#7164 ) (#7279 )	2026-04-09 21:37:42 +08:00
YuBaoku	921a0ae60b	[Docs] Update docs for release/2.5 (#7267 ) (#7277 ) * Update docs for release/2.5 * Update English docs for release/2.5 - Update README_EN.md: add v2.5 news entry, reformat v2.4 entry with release link - Update docs/get_started/installation/nvidia_gpu.md: - Docker image: 2.4.0 -> 2.5.0, notice now shows SM80/86/89/90 support - paddlepaddle-gpu: 3.3.0 -> 3.3.1, add CUDA 12.9 alternatives - fastdeploy-gpu: 2.4.0 -> 2.5.0, unified arch install with CUDA 12.9 option - Update docs/zh/get_started/installation/nvidia_gpu.md: - Fix remaining paddlepaddle-gpu==3.3.0 refs in sections 4&5 -> 3.3.1 Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/fa0be381-324e-4b0d-b7a6-e2c1fa12174f * Clarify --extra-index-url usage in installation docs Add note explaining that --extra-index-url is only for downloading fastdeploy-gpu dependencies; fastdeploy-gpu itself must be installed from the Paddle source specified by -i. Applied to both Chinese and English nvidia_gpu.md installation guides. Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/9fa8b3c9-7555-4eae-b9b9-026cddd7e74c * Update nvidia_gpu.md --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-04-09 21:03:19 +08:00
Jiaxin Sui	6fcc25f3f6	Update ci_metax.yml (#7286 )	2026-04-09 17:31:20 +08:00
Bingoo	849eb3df65	[Cherry-Pick][Optimization] merge matmul and add （#6986） (#7191 ) * merge matmul and add * modify format * using paddle.nn.functional.linear * using _C_ops.linear * using paddle.nn.functional.linear * add FLAGS_use_legacy_linear env var in test case * fix format * add assert and remove env * modify format * using matmul for no bias * modify accurate baseline	2026-04-09 14:15:43 +08:00
YuBaoku	098dd2c251	[XPU][CI] lock xvllm version for fix bug (#7264 ) (#7266 ) * Remove duplicate NICs from environment variables * Update version for xvllm in download_dependencies.sh Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-04-09 12:46:13 +08:00
xiaoxiaohehe001	5fd8020363	[Cherry-Pick][BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn (#7216 )	2026-04-09 11:05:43 +08:00
JYChen	9c65655cb3	[Cherry-Pick][RL] support moe-topk use topk_reduce_func #7218 (#7256 ) * support moe-topk use topk_reduce_func * fix ep error * fix ut * fix ut	2026-04-09 11:01:10 +08:00
Bingoo	01818844b4	support moe for sm103 (#7240 )	2026-04-08 20:56:23 +08:00
YuBaoku	84d62712c9	[Feature]distinguish whl version (#7204 ) (#7224 ) * [Feature]whl version * [Feature]whl version,set root_is_pure = false * [Feature]code style Co-authored-by: ChowMingSing <610208940@qq.com>	2026-04-08 17:32:38 +08:00
YuBaoku	6b78981dde	Split enable_mm (#7183 ) (#7233 ) Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com> Co-authored-by: liuruian <liuruian@MacBook-Pro.local>	2026-04-08 16:32:04 +08:00
GoldPancake	403ce139c7	remove arctic_inference deps (#7236 )	2026-04-08 15:25:21 +08:00
huicongyao	36909bf27d	[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172 ) (#7192 ) * fix MTP bugs in TP and overlap * fix	2026-04-08 10:24:38 +08:00
YuBaoku	7ab48c4760	[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (#7186 ) (#7195 )	2026-04-03 20:55:53 +08:00
Yonghua Li	55dbc83310	[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141 ) (#7181 ) * [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (#7163) * Set MC_MAX_MR_SIZE to avoid register hang * up * [fix] prevent requests from entering running state without a slot * [fix] count abort set * [fix] count preempted task in waiting list --------- Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com>	2026-04-03 17:46:13 +08:00
Jiang-Jia-Jun	b24765a746	Update setup.py	2026-04-03 11:29:22 +08:00
jackyYang6	e3aed6de2f	fix oom bug, optimize async weight loading and update read step by yaml (#7171 )	2026-04-03 11:05:24 +08:00
jc	1cc0cf23c2	[BugFix] Set MC_MAX_MR_SIZE to avoid register hang in default (#7161 ) * Set MC_MAX_MR_SIZE to avoid register hang * Set MC_MAX_MR_SIZE to avoid register hang	2026-04-03 10:51:15 +08:00
chenjian	2632e6cf32	[Feature] Support chunk prefill disabled in scheduler v1 (#7152 )	2026-04-03 10:18:14 +08:00
luukunn	562fa31791	[BugFix]fix extract_tool_calls (#7154 ) * fix extract_tool_calls	2026-04-02 21:18:37 +08:00
Yonghua Li	98f3fc9267	[RL] [KVCache] let cache transfer managers update key prefix after weight update and add unit tests (#7083 ) * [test] add a few unit tests * [feat] update key prefix when model weights are updated * [test] try to fix test_worker_process	2026-04-02 19:58:41 +08:00
fxyfxy777	9f3b3ce7f5	[Optimization] merge_allreduce (#7039 )	2026-04-02 19:52:13 +08:00
bukejiyu	f142b486c9	update (#7101 )	2026-04-02 16:07:26 +08:00
Longzhi Wang	938e7dd881	[Other] support video_fps args for video bench (#7077 )	2026-04-02 10:40:15 +08:00
YuBaoku	7aa213bba9	[CI] Replace ipc=host with shm-size and sysctl configuration (#7138 )	2026-04-02 10:33:55 +08:00

1 2 3 4 5 ...

5000 Commits