FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
YuBaoku	65c6e726f5	[Cherry-Pick][Docs] Update Release Note(#7302 ) (#7341 )	2026-04-11 16:48:06 +08:00
YuBaoku	2ac9b89409	[XPU][CI]Update xtdk version in download_dependencies.sh (#7320 ) (#7322 ) Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-04-11 00:27:54 +08:00
GoldPancake	c7560383ab	[Cherry-Pick][FDConfig] Auto-scale CUDA Graph Capture & CLI Quantization Params + CUDAGraph Validation (#7215,#7281) (#7301 ) * refactor cudagraph args * refactor quant cli param * fix * fix * tmp skip xpu * fix	2026-04-10 16:10:31 +08:00
zhangbo9674	4f36346e14	[Cherry-Pick] change rms norm for glm #7269 (#7276 ) * fix * refine code * refine code * refine code * refine code * refine code	2026-04-10 01:03:00 -07:00
YuBaoku	dd0863b076	[BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bound bug (#7221 ) (#7296 ) Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>	2026-04-10 13:54:02 +08:00
fxyfxy777	dea9d35171	[OP]Unify MoE op with moe_permute path for bf16 GLM (#7164 ) (#7279 )	2026-04-09 21:37:42 +08:00
YuBaoku	921a0ae60b	[Docs] Update docs for release/2.5 (#7267 ) (#7277 ) * Update docs for release/2.5 * Update English docs for release/2.5 - Update README_EN.md: add v2.5 news entry, reformat v2.4 entry with release link - Update docs/get_started/installation/nvidia_gpu.md: - Docker image: 2.4.0 -> 2.5.0, notice now shows SM80/86/89/90 support - paddlepaddle-gpu: 3.3.0 -> 3.3.1, add CUDA 12.9 alternatives - fastdeploy-gpu: 2.4.0 -> 2.5.0, unified arch install with CUDA 12.9 option - Update docs/zh/get_started/installation/nvidia_gpu.md: - Fix remaining paddlepaddle-gpu==3.3.0 refs in sections 4&5 -> 3.3.1 Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/fa0be381-324e-4b0d-b7a6-e2c1fa12174f * Clarify --extra-index-url usage in installation docs Add note explaining that --extra-index-url is only for downloading fastdeploy-gpu dependencies; fastdeploy-gpu itself must be installed from the Paddle source specified by -i. Applied to both Chinese and English nvidia_gpu.md installation guides. Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/9fa8b3c9-7555-4eae-b9b9-026cddd7e74c * Update nvidia_gpu.md --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-04-09 21:03:19 +08:00
Jiaxin Sui	6fcc25f3f6	Update ci_metax.yml (#7286 )	2026-04-09 17:31:20 +08:00
Bingoo	849eb3df65	[Cherry-Pick][Optimization] merge matmul and add （#6986） (#7191 ) * merge matmul and add * modify format * using paddle.nn.functional.linear * using _C_ops.linear * using paddle.nn.functional.linear * add FLAGS_use_legacy_linear env var in test case * fix format * add assert and remove env * modify format * using matmul for no bias * modify accurate baseline	2026-04-09 14:15:43 +08:00
YuBaoku	098dd2c251	[XPU][CI] lock xvllm version for fix bug (#7264 ) (#7266 ) * Remove duplicate NICs from environment variables * Update version for xvllm in download_dependencies.sh Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-04-09 12:46:13 +08:00
xiaoxiaohehe001	5fd8020363	[Cherry-Pick][BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn (#7216 )	2026-04-09 11:05:43 +08:00
JYChen	9c65655cb3	[Cherry-Pick][RL] support moe-topk use topk_reduce_func #7218 (#7256 ) * support moe-topk use topk_reduce_func * fix ep error * fix ut * fix ut	2026-04-09 11:01:10 +08:00
Bingoo	01818844b4	support moe for sm103 (#7240 )	2026-04-08 20:56:23 +08:00
YuBaoku	84d62712c9	[Feature]distinguish whl version (#7204 ) (#7224 ) * [Feature]whl version * [Feature]whl version,set root_is_pure = false * [Feature]code style Co-authored-by: ChowMingSing <610208940@qq.com>	2026-04-08 17:32:38 +08:00
YuBaoku	6b78981dde	Split enable_mm (#7183 ) (#7233 ) Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com> Co-authored-by: liuruian <liuruian@MacBook-Pro.local>	2026-04-08 16:32:04 +08:00
GoldPancake	403ce139c7	remove arctic_inference deps (#7236 )	2026-04-08 15:25:21 +08:00
huicongyao	36909bf27d	[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172 ) (#7192 ) * fix MTP bugs in TP and overlap * fix	2026-04-08 10:24:38 +08:00
YuBaoku	7ab48c4760	[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (#7186 ) (#7195 )	2026-04-03 20:55:53 +08:00
Yonghua Li	55dbc83310	[Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141 ) (#7181 ) * [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (#7163) * Set MC_MAX_MR_SIZE to avoid register hang * up * [fix] prevent requests from entering running state without a slot * [fix] count abort set * [fix] count preempted task in waiting list --------- Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com>	2026-04-03 17:46:13 +08:00
Jiang-Jia-Jun	b24765a746	Update setup.py	2026-04-03 11:29:22 +08:00
jackyYang6	e3aed6de2f	fix oom bug, optimize async weight loading and update read step by yaml (#7171 )	2026-04-03 11:05:24 +08:00
jc	1cc0cf23c2	[BugFix] Set MC_MAX_MR_SIZE to avoid register hang in default (#7161 ) * Set MC_MAX_MR_SIZE to avoid register hang * Set MC_MAX_MR_SIZE to avoid register hang	2026-04-03 10:51:15 +08:00
chenjian	2632e6cf32	[Feature] Support chunk prefill disabled in scheduler v1 (#7152 )	2026-04-03 10:18:14 +08:00
luukunn	562fa31791	[BugFix]fix extract_tool_calls (#7154 ) * fix extract_tool_calls	2026-04-02 21:18:37 +08:00
Yonghua Li	98f3fc9267	[RL] [KVCache] let cache transfer managers update key prefix after weight update and add unit tests (#7083 ) * [test] add a few unit tests * [feat] update key prefix when model weights are updated * [test] try to fix test_worker_process	2026-04-02 19:58:41 +08:00
fxyfxy777	9f3b3ce7f5	[Optimization] merge_allreduce (#7039 )	2026-04-02 19:52:13 +08:00
bukejiyu	f142b486c9	update (#7101 )	2026-04-02 16:07:26 +08:00
Longzhi Wang	938e7dd881	[Other] support video_fps args for video bench (#7077 )	2026-04-02 10:40:15 +08:00
YuBaoku	7aa213bba9	[CI] Replace ipc=host with shm-size and sysctl configuration (#7138 )	2026-04-02 10:33:55 +08:00
YuBaoku	db808f2080	[CI] Optimize log cleanup and isolation in unittest (#7132 )	2026-04-01 22:07:55 +08:00
Yuanle Liu	1af7f80811	Revert "[BugFix][Speculative Decoding] Correct index calculation in speculate…" (#7133 ) This reverts commit `ba1aa1edff`.	2026-04-01 06:54:23 -07:00
luukunn	fa7a84926d	[Optimization]Fix tool parser (#7079 ) * fix tool parser	2026-04-01 21:20:34 +08:00
Bingoo	410988d9ec	[OP] support deepgeem for sm103 (#7073 ) * support deepgeem for sm103 * add assert * modify code style * add assert * modify sm version condition * remove assert	2026-04-01 21:01:09 +08:00
lonelygsh	ba1aa1edff	[BugFix][Speculative Decoding] Correct index calculation in speculate decoding operators (#7121 ) - Fix accept_idx calculation in spec_set_value_by_stop_seqs - Fix condition check from < to <= for token matching - Fix accept_tokens indexing logic - Remove unnecessary -1 in current_step comparison for max_think_len Co-authored-by: guanshihui] <guanshihui@baidu.com>	2026-04-01 05:36:53 -07:00
cmcamdy	7a2e33098f	[XPU] Refactor pre process (#6993 ) * [XPU] support speculate_pre_process * merge develop * fix codestype * fix mtp, support cu_seqlens_q_output * fix mtp, support cu_seqlens_q_output * fix test --------- Co-authored-by: lizan1999 <lizan03@baidu.com>	2026-04-01 20:29:55 +08:00
mouxin	fba8a51ad1	[Feature] Fix mixed cache-aware (#7129 ) * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Fix mixed cache-aware --------- Co-authored-by: mouxin <mouxin@baidu.com>	2026-04-01 19:29:29 +08:00
Jingfeng Wu	3b564116d5	[Docs] Add docs for disaggregated deployment (#6700 ) * add docs for disaggregated deployment * pre-commit run for style check * update docs	2026-04-01 19:27:09 +08:00
yzwu	ceaf5df350	[Iluvatar] Fix cuda graph error for tp > 1 in ernie models (#7126 )	2026-04-01 19:13:34 +08:00
luukunn	fdfc908e2f	[Others] reuse unit test (#7127 )	2026-04-01 18:36:00 +08:00
mouxin	6cae9b1f50	[Feature] Config eviction_duration (#7125 ) * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration --------- Co-authored-by: mouxin <mouxin@baidu.com>	2026-04-01 16:46:21 +08:00
sunxin	c29e86fc9d	[Feature] Support mtp overlap schedule (#7001 )	2026-04-01 14:24:26 +08:00
YuBaoku	c6f0c5c3a6	[CI] Optimize test execution with single-GPU parallelism (#7085 ) * [CI] Optimize test execution with single-GPU parallelism and log collection * remove export CUDA_VISIBLE_DEVICES * fix path error * fix log_* path and debug * [CI] Optimize test execution with single-GPU parallelism and log collection	2026-04-01 14:18:40 +08:00
zhouchong	91c832f607	[Feature] Add logging parameters and error output to terminal (#7098 )	2026-04-01 13:18:42 +08:00
jc	af51fc46d6	[PD Disaggregation] Write the cache of preempted req to storage and refine PD Disaggregation (#7107 ) * Write the cache of preempted req to storage * up * fix	2026-04-01 13:15:52 +08:00
luukunn	3651113ee5	[DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052 ) * remove ENABLE_V1_DATA_PROCESSOR * fix unit test * fix unit test	2026-04-01 09:53:41 +08:00
qwes5s5	ee2b965f5f	adjust config info (#7054 )	2026-03-31 21:26:05 +08:00
Yonghua Li	a3cc3aa777	[BugFix] reset exist tasks signal in clear_data (#7111 ) * [BugFix] reset exist tasks signal in clear_data * [Fix] fix stale exist tasks signal after weight update * [Chore] downgrade detected new requests log to DEBUG level * [fix] adjust continue place	2026-03-31 21:24:08 +08:00
周周周	fd44bb7cbf	cpmmot (#7105 ) Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-03-31 16:13:44 +08:00
cloudforge1	5c5dc66aa7	[CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 (#6731 ) * [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 * [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-03-31 15:29:35 +08:00
YilongGuo	dd61e7e421	[Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditionalGeneration (#7086 ) Add clear_grpah_opt_backend method that delegates to the underlying model to clear cuda graph optimization backend. Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-03-31 13:48:25 +08:00

1 2 3 4 5 ...

4979 Commits