FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-10 09:31:48 +08:00

Author	SHA1	Message	Date
xiaoxiaohehe001	00a01ae024	[Feature] Support redundant expert for eplb (#5918 ) * [BugFix] support redundant expert for eplb * support redundant expert for eplb * support redundant expert for eplb * update * fix ci eplb	2026-01-09 17:13:24 +08:00
CSWYF3634076	e6cdea4492	[Models] Qwen3VL and Qwen3VL-Moe CUDA graph Support (#5962 ) * [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support * [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v2 * [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v3	2026-01-09 17:09:02 +08:00
zccjjj	20de04e249	[XPU] move xpu_attn_backend.py to FastDeploy/fastdeploy/model_executor/layers/backends/xpu (#5878 )	2026-01-09 16:34:57 +08:00
essos	1d20957340	[CI]【Hackathon 9th Sprint No.50】NO.50 功能模块 fastdeploy/entrypoints/engine_client.py 单测补充 -part #5045 (#5807 ) * update test code * 减少 mock * fix style --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-01-09 15:13:19 +08:00
GoldPancake	3ca99ab170	[Speculative Decoding] Return accepted tokens per head in response (#5947 ) * adjust log level * add accepted tokens per head * fix ut * fix	2026-01-09 14:32:08 +08:00
kevin	2d2b156252	[BugFix] fix dyc8 cache bug (#5958 ) * fix dyc8 cache bug * update code	2026-01-08 19:25:47 -08:00
GoldPancake	e41d434548	[Bugfix] Fix entropy calculation bugs (#5941 ) * fix entropy bugs	2026-01-08 20:57:35 +08:00
CSWYF3634076	d8fcb7c07d	[Models] Add Qwen3-VL Moe Model Support (#5913 ) * [Model] add Qwen3vl moe model support * [Model] add Qwen3vl moe model support remove log * [Model] add Qwen3vl moe model support unittest	2026-01-08 11:36:42 +08:00
xunyoyo	78adf83549	[CI] 【Hackathon 9th Sprint No.18】NO.18 功能模块单测补充 -new (#5717 ) * Remove paddle import guards from DeepEP tests * Sort imports in DeepEP tests * Refactor assertions for combine handle in test_ep.py Updated assertions to verify combine handle in DeepEPEngine. * Add moe_select coverage in DeepEP tests * Refactor assertions for combine handle in test_ep * Strengthen moe_select assertions in DeepEP tests * Update test_ep.py --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-01-07 17:20:59 +08:00
kevin	eabd01cd21	[BugFix] fix eb5 prefix bug (#5879 ) * fix eb5 prefix bug * update ci test * update code * update code * update code * update code * update code * update code * update code	2026-01-06 23:50:39 -08:00
ddchenhao66	733014bf32	[XPU] Support EP4TP1 in pd disaggregation (#5860 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2026-01-06 15:25:36 +08:00
Yonghua Li	9445fbe054	[KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled (#5871 ) * [fix] temporarily forbid cpu cache in update/clear api * [fix] stop launching cache transfer manager unless hierarchical cache is enabled * [fix] fix no attr hierarchical cache * [fix] fix ci * [fix] fix test_prefix_cache_manager.py	2026-01-06 14:27:47 +08:00
Yonghua Li	9fc2400e71	[BugFix] fix mtp cache attaching for pd disaggregation (#5884 ) * [fix] fix mtp cache attaching for pd disaggregation * [fix] fix test_mtp_proposer.py	2026-01-06 14:17:53 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
周周周	7a0744f05a	[UT]support attention test tp (#5887 )	2026-01-06 11:15:01 +08:00
Jiaxin Sui	2785b820c8	[XPU][CI] Add XPU logprobs case (#5874 ) * Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function * Update paddlepaddle-xpu installation command Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error. * Update PaddlePaddle installation command * Remove max_tokens from model response configuration Removed max_tokens parameter from the model response call. * add xpu logprobs case * Fix formatting and improve setup_logprobs_env Add newline at end of file and update setup_logprobs_env function. * Refactor test_logprobs_21b_tp4.py for clarity * Change top_p value from 1.0 to 0 --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>	2026-01-05 19:01:14 +08:00
jc	8d384f9fd8	[PD Disaggregation] Update usage of pd disaggregation and data parallel (#5742 ) * Update usage of pd disaggregation * up * up * up * up * up * up * up * up * up * up dp docs * up * up * up * fix unittest	2026-01-05 17:51:29 +08:00
jc	e911ac2ce7	[BugFix] Refine the preparation of cpu and storage cache (#5777 ) * Refine the preparation of cpu and storage cache * fix error * fix error * up * fix * up docs * fix unittest * remove debug info	2026-01-05 10:13:30 +08:00
kevin	52dc9a7b85	[BugFix] skip mm revert (#5848 ) * skip mm revert * update code * update test	2026-01-04 14:25:45 +08:00
周周周	e3957a5ebc	[Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620 )	2026-01-04 11:21:15 +08:00
GoldPancake	4e10ae5d99	[Speculative Decoding] Optimize draft logprob (#5842 ) * optimize draft logprob * fix ut	2025-12-31 13:35:56 +08:00
xjkmfa	ed60b4da32	[CI case]Prompt logprob (#5835 ) * [ci case]prompt_logprobs	2025-12-30 21:26:06 +08:00
essos	b03a4f3e3d	[CI]【Hackathon 9th Sprint No.46】NO.46 功能模块 fastdeploy/model_executor/guided_decoding/xgrammar_backend.py 单测补充 (#5042 ) * test * rename ut * remove test max_rollback_tokens * update * 精简代码 * fix: torch use mock --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-30 17:05:26 +08:00
chen	0bcf924e10	[Optimization] Optimization for gather_logprob by 10GB (#5817 ) * opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k	2025-12-30 15:33:34 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
GoldPancake	e78e22ebd5	[BugFix] Fix entropy bugs (#5818 ) * fix entropy bugs * fix ut * fix	2025-12-29 20:44:29 -08:00
周周周	7ae13b2326	[PD Disaggregation]remove unsed para in RDMACommManager (#5814 )	2025-12-30 11:38:30 +08:00
Yonghua Li	a8d3e3ba12	[BugFix] fix shm opened but not closed in set_data_ipc (#5826 )	2025-12-29 23:35:07 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
essos	ffb3ccff74	[CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 (#5047 ) * add test * update test * 精简代码 * 去除 mock --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:44:56 +08:00
xunyoyo	7e39560a42	[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new (#5726 ) * Add cache messager coverage tests * Add default_dtype parameter to test cache manager --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:42:27 +08:00
essos	8ee055aafc	[CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 (#5050 ) * Add comprehensive unit tests for data type conversion functionality * fix * Fix unit test failures in test_local_scheduler.py * update * fix code * update mock * add ut * rm file * update test * 删除已覆盖的测试用例 --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 12:41:50 +08:00
ddchenhao66	56a9ecccb2	[XPU] xpu support ep4tp4 (#5773 ) * [XPU] xpu support ep4tp4 * Add commands to check multiprocessing and fastdeploy processes --------- Co-authored-by: ddchenhao66 <dhaochen163.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-29 11:27:01 +08:00
YuBaoku	c3ccfa974c	[CI] Fix path error and port conflict (#5803 )	2025-12-27 12:50:58 +08:00
kxz2002	cad2932990	[BugFix] Fix process_response_dict to support async in serving_completion (#5758 ) * support process_response_dict async initial commit * fixbug * add unit test * optimize	2025-12-26 17:40:58 +08:00
kevin	894f4e312b	[FDConfig] disable chunked_mm_input in ernie5 (#5774 ) * disable chunked_mm_input in ernie5 * update code * update code * update test case * update testcase * upate case	2025-12-26 15:31:27 +08:00
yzwu	7b6cc11952	[Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735 )	2025-12-26 14:01:27 +08:00
YuBaoku	4c22a5afb8	[CI] Disable GPU cleanup due to CI machine limitations (#5781 )	2025-12-26 00:11:06 +08:00
kevin	4fa76296d9	[BugFix] fix mm splitwise scheduler bug (#5604 ) * fix mm splitwise scheduler bug * fix test case bug * update code * update code	2025-12-25 04:08:11 -08:00
Copilot	1cbf448178	[Feature] Add startup version check mechanism for Paddle (#5769 ) * Initial plan * 实现版本检查机制：添加get_version_info函数并在启动时检查Paddle版本 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * 修复代码审查反馈：改进错误处理和日志记录 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Change comments and warning messages from Chinese to English Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Update fastdeploy/__init__.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-25 19:29:04 +08:00
freeliuzc	9018ccf74e	[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 ) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register	2025-12-25 01:54:59 -08:00
YuBaoku	7247dc5f3a	[CI] Add retry and robust cleanup for removal (#5725 ) * [CI] Add retry and robust cleanup for removal * [CI] Ensure clean GPU memory by killing leftover processes	2025-12-25 17:08:27 +08:00
Juncai	412867fd99	[Feature] Support KV Cache Storage (#5571 ) * Support Mooncake Store * up * up * add op * fix conflict * fix error * up for comments * avoid thread lock * up * fix unittest * fix unittest * remove debug info * consider tp_size > 1 * add default rdma_nics * add utils * up * fix error --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-25 16:30:35 +08:00
memoryCoderC	be3be4913a	[Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM (#5195 ) * [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM * [Optimization] refactor(chat_handler,completion_handler): rename class	2025-12-25 16:28:15 +08:00
Jiaxin Sui	8fc789bb3f	[iluvatar][CI] refactor iluvatar_ci (#5588 ) * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * Update Docker image tag in iluvatar_test workflow * Update default Docker image version in workflow * Update iluvatar_test.yml * Update default Docker image in workflow config * Update model path in run_ernie300B_4layer.py * Update model path in offline inference check * Add model_data directory and copy model files Create model_data directory and copy necessary files. * Update run_ernie_vl_28B.py * Update run_ernie300B_4layer.py * Update paddlepaddle installation method in script * Change wget command to include proxy option * Modify paddle package installation in CI script Updated installation commands for paddle packages. * Update paddlepaddle and paddle-iluvatar-gpu versions * Delete .github/workflows/ci_iluvatar.yml * Rename workflow from ILUVATAR Test to ILUVATAR-CI * Update installation commands for paddlepaddle and iluvatar	2025-12-25 15:10:34 +08:00
YuBaoku	0410c42a9a	[CI] Refactor RL tests to reuse stable_test (#5516 ) * [CI] Refactor RL tests to reuse stable_test	2025-12-24 19:18:00 +08:00
YuBaoku	e75f93d302	[CI] Refactor RL tests to reuse test_metrics (#5741 )	2025-12-24 17:08:40 +08:00
Divano	6b0fba8294	Update run.sh	2025-12-24 15:35:17 +08:00
Nyakku Shigure	11227e00bb	[GraphOptimization] Wrap deep gemm and triton as python op (#5673 ) * [GraphOptimization] Wrap deep gemm and triton as python op * add unitest to _base_test && compatibility * paddle.static.MetaTensor -> "paddle.static.MetaTensor" * mv register_custom_python_op * rename yaml --------- Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com>	2025-12-24 15:23:46 +08:00
bukejiyu	ba4b7afb3a	[Others] Rename tensor_parallel_degree to tensor_model_parallel_size for paddleformers 0.4.1 (#5727 )	2025-12-23 23:19:11 -08:00

1 2 3 4 5 ...

607 Commits