FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-10 17:41:13 +08:00

Author	SHA1	Message	Date
周周周	7a0744f05a	[UT]support attention test tp (#5887 )	2026-01-06 11:15:01 +08:00
Copilot	5c53193c4e	[Docs] Update GPU version from 2.3.0 to 2.3.2 in installation documentation (#5894 ) * Initial plan * Update GPU version from 2.3.0 to 2.3.2 in NVIDIA GPU installation documentation Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-06 11:06:32 +08:00
Yuanle Liu	5e729bc2ba	[OPs] ep_moe_expert_dispatch.cu dispatch num_experts_per_rank 5 (#5890 )	2026-01-06 10:39:35 +08:00
Neil Zhu	272a371635	[Metax] optimize flash attention backend (#5876 )	2026-01-06 09:52:09 +08:00
周周周	ab553b3b8b	revert cuda_check (#5883 )	2026-01-05 20:51:31 +08:00
Jiaxin Sui	2785b820c8	[XPU][CI] Add XPU logprobs case (#5874 ) * Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function * Update paddlepaddle-xpu installation command Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error. * Update PaddlePaddle installation command * Remove max_tokens from model response configuration Removed max_tokens parameter from the model response call. * add xpu logprobs case * Fix formatting and improve setup_logprobs_env Add newline at end of file and update setup_logprobs_env function. * Refactor test_logprobs_21b_tp4.py for clarity * Change top_p value from 1.0 to 0 --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>	2026-01-05 19:01:14 +08:00
lizexu123	1d3ae7c024	[BugFix] fix w4afp8 tp=8 (#5868 ) * fix w4afp8 tp=8 * fix	2026-01-05 18:59:02 +08:00
tianhaodongbd	6f14b180e3	[RL] Change 'model' to the instance variable 'tmp_model' (#5872 )	2026-01-05 02:09:02 -08:00
ming1753	f50e1bcc16	[Others] enable use PFCC deep_ep (#5822 ) * upstream deep_ep * fix bug * fix bug * modify env name	2026-01-05 02:07:01 -08:00
jc	8d384f9fd8	[PD Disaggregation] Update usage of pd disaggregation and data parallel (#5742 ) * Update usage of pd disaggregation * up * up * up * up * up * up * up * up * up * up dp docs * up * up * up * fix unittest	2026-01-05 17:51:29 +08:00
cmcamdy	690d4bcdb0	[XPU] Speculative Decoding with PD (#5856 ) * [XPU] Speculative Decoding with PD * fix post process * share kv cache sender * support speculate decoding step system cache * support speculate decoding step system cache --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com>	2026-01-05 17:31:03 +08:00
chen	ac39c0f887	support fa3 qwen-vl rope (#5869 )	2026-01-05 15:29:34 +08:00
sunxin	adb91dcacc	[BugFix] Fix wint4 ep issue caused by empty run (#5870 )	2026-01-05 14:24:37 +08:00
周周周	dc13344ab8	[Optimization] add del to decrease peak memory in MoE prefill (#5863 )	2026-01-05 14:01:48 +08:00
jc	e911ac2ce7	[BugFix] Refine the preparation of cpu and storage cache (#5777 ) * Refine the preparation of cpu and storage cache * fix error * fix error * up * fix * up docs * fix unittest * remove debug info	2026-01-05 10:13:30 +08:00
jc	95257c1dbd	[Feature] RDMACommunicator send key and value scale (#5737 ) * RDMACommunicator send key and value scale --------- Co-authored-by: kevin <chengyf112@gmail.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-05 10:04:24 +08:00
Copilot	7d5282e158	[APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT (#5865 ) * Initial plan * Add configurable FD_WORKER_ALIVE_TIMEOUT environment variable Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Add test for FD_WORKER_ALIVE_TIMEOUT environment variable Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Update docs/zh/usage/environment_variables.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update docs/usage/environment_variables.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Improve test coverage to validate integration with check_health calls Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Remove test_worker_alive_timeout.py per reviewer feedback Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-05 09:47:12 +08:00
YuBaoku	37a128e240	[CI] Fix reusable workflow output mapping in _build_linux_rl.yml Fix incorrect job reference in reusable workflow outputs, which caused the RL wheel path to be dropped.	2026-01-04 21:22:07 +08:00
Yonghua Li	5e4e6692a4	[BugFix] fix cache manager not launched in case of mtp or blockwise fp8 (#5840 ) * [BugFix] fix cache manager not launched in case of mtp or blockwise fp8 * [fix] fix mtp cache in mtp.py * [fix] fix gpu ops import * [fix] fix mtp layer idx * [fix] fix xpu model runner mtp cache * [fix] fix mtp import	2026-01-04 04:35:37 -08:00
YuBaoku	55f77e9ab1	[CI] Add commit-level build_linux task for RL (#5857 )	2026-01-04 20:31:27 +08:00
Zhang Yulong	2da32f2a35	Update benchmark_serving.py (#5861 )	2026-01-04 20:07:56 +08:00
kevin	52dc9a7b85	[BugFix] skip mm revert (#5848 ) * skip mm revert * update code * update test	2026-01-04 14:25:45 +08:00
周周周	e3957a5ebc	[Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620 )	2026-01-04 11:21:15 +08:00
MingkunZhang	f732d7d2ad	[Metax] adapt prefix caching & cpu swap (#5844 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2025-12-31 17:02:48 +08:00
chen	193886e745	only cuda run triton op (#5846 )	2025-12-31 14:17:31 +08:00
GoldPancake	4e10ae5d99	[Speculative Decoding] Optimize draft logprob (#5842 ) * optimize draft logprob * fix ut	2025-12-31 13:35:56 +08:00
ddchenhao66	9e45ef7ca9	[XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL (#5831 )	2025-12-31 09:49:12 +08:00
kevin	74e162697f	eb5 mm skip prefix cache (#5838 )	2025-12-30 05:30:48 -08:00
xjkmfa	ed60b4da32	[CI case]Prompt logprob (#5835 ) * [ci case]prompt_logprobs	2025-12-30 21:26:06 +08:00
Sunny-bot1	598d292a69	w4afp8 fix quant (#5830 )	2025-12-30 21:16:13 +08:00
essos	b03a4f3e3d	[CI]【Hackathon 9th Sprint No.46】NO.46 功能模块 fastdeploy/model_executor/guided_decoding/xgrammar_backend.py 单测补充 (#5042 ) * test * rename ut * remove test max_rollback_tokens * update * 精简代码 * fix: torch use mock --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-30 17:05:26 +08:00
chen	0bcf924e10	[Optimization] Optimization for gather_logprob by 10GB (#5817 ) * opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k	2025-12-30 15:33:34 +08:00
YuBaoku	98519ee2e9	[CI] Fix archive URL injection in tag image build (#5828 )	2025-12-30 14:28:17 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
GoldPancake	e78e22ebd5	[BugFix] Fix entropy bugs (#5818 ) * fix entropy bugs * fix ut * fix	2025-12-29 20:44:29 -08:00
tianhaodongbd	edb9647422	[RL] add lm_head_fp32 in RolloutModelConfig (#5825 )	2025-12-29 20:22:30 -08:00
周周周	7ae13b2326	[PD Disaggregation]remove unsed para in RDMACommManager (#5814 )	2025-12-30 11:38:30 +08:00
Yonghua Li	a8d3e3ba12	[BugFix] fix shm opened but not closed in set_data_ipc (#5826 )	2025-12-29 23:35:07 +08:00
CSWYF3634076	deb9698ac5	remove invalid elif branch (#5821 )	2025-12-29 19:21:28 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
周周周	a3f0696e35	[BugFix] fix compile error in sm89 (#5809 )	2025-12-29 16:55:52 +08:00
Ryan	eb782a0225	[BugFix] Fix return value inconsistency for `ep_moe_expert_combine` op (#5812 )	2025-12-29 16:44:00 +08:00
essos	ffb3ccff74	[CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 (#5047 ) * add test * update test * 精简代码 * 去除 mock --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:44:56 +08:00
xunyoyo	7e39560a42	[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new (#5726 ) * Add cache messager coverage tests * Add default_dtype parameter to test cache manager --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:42:27 +08:00
Longzhi Wang	11329ee35e	[Model] support mode config for expert_dispatch (#5748 )	2025-12-29 13:37:20 +08:00
essos	8ee055aafc	[CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 (#5050 ) * Add comprehensive unit tests for data type conversion functionality * fix * Fix unit test failures in test_local_scheduler.py * update * fix code * update mock * add ut * rm file * update test * 删除已覆盖的测试用例 --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 12:41:50 +08:00
ddchenhao66	56a9ecccb2	[XPU] xpu support ep4tp4 (#5773 ) * [XPU] xpu support ep4tp4 * Add commands to check multiprocessing and fastdeploy processes --------- Co-authored-by: ddchenhao66 <dhaochen163.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-29 11:27:01 +08:00
chenjian	91a2b13676	[BugFix] Fix preemption out of real_bsz (#5805 )	2025-12-29 09:52:36 +08:00
YuBaoku	c3ccfa974c	[CI] Fix path error and port conflict (#5803 )	2025-12-27 12:50:58 +08:00
Nyakku Shigure	da9ea88a3b	[BugFix] Correct condition for `reversed_window_indices` in `SiglipEncoder` (#5795 )	2025-12-26 19:16:07 +08:00

1 2 3 4 5 ...

4336 Commits