FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-10 17:41:13 +08:00

Author	SHA1	Message	Date
YuBaoku	98519ee2e9	[CI] Fix archive URL injection in tag image build (#5828 )	2025-12-30 14:28:17 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
GoldPancake	e78e22ebd5	[BugFix] Fix entropy bugs (#5818 ) * fix entropy bugs * fix ut * fix	2025-12-29 20:44:29 -08:00
tianhaodongbd	edb9647422	[RL] add lm_head_fp32 in RolloutModelConfig (#5825 )	2025-12-29 20:22:30 -08:00
周周周	7ae13b2326	[PD Disaggregation]remove unsed para in RDMACommManager (#5814 )	2025-12-30 11:38:30 +08:00
Yonghua Li	a8d3e3ba12	[BugFix] fix shm opened but not closed in set_data_ipc (#5826 )	2025-12-29 23:35:07 +08:00
CSWYF3634076	deb9698ac5	remove invalid elif branch (#5821 )	2025-12-29 19:21:28 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
周周周	a3f0696e35	[BugFix] fix compile error in sm89 (#5809 )	2025-12-29 16:55:52 +08:00
Ryan	eb782a0225	[BugFix] Fix return value inconsistency for `ep_moe_expert_combine` op (#5812 )	2025-12-29 16:44:00 +08:00
essos	ffb3ccff74	[CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 (#5047 ) * add test * update test * 精简代码 * 去除 mock --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:44:56 +08:00
xunyoyo	7e39560a42	[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new (#5726 ) * Add cache messager coverage tests * Add default_dtype parameter to test cache manager --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:42:27 +08:00
Longzhi Wang	11329ee35e	[Model] support mode config for expert_dispatch (#5748 )	2025-12-29 13:37:20 +08:00
essos	8ee055aafc	[CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 (#5050 ) * Add comprehensive unit tests for data type conversion functionality * fix * Fix unit test failures in test_local_scheduler.py * update * fix code * update mock * add ut * rm file * update test * 删除已覆盖的测试用例 --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 12:41:50 +08:00
ddchenhao66	56a9ecccb2	[XPU] xpu support ep4tp4 (#5773 ) * [XPU] xpu support ep4tp4 * Add commands to check multiprocessing and fastdeploy processes --------- Co-authored-by: ddchenhao66 <dhaochen163.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-29 11:27:01 +08:00
chenjian	91a2b13676	[BugFix] Fix preemption out of real_bsz (#5805 )	2025-12-29 09:52:36 +08:00
YuBaoku	c3ccfa974c	[CI] Fix path error and port conflict (#5803 )	2025-12-27 12:50:58 +08:00
Nyakku Shigure	da9ea88a3b	[BugFix] Correct condition for `reversed_window_indices` in `SiglipEncoder` (#5795 )	2025-12-26 19:16:07 +08:00
Ryan	09229d8953	change `count_tokens_per_expert_func` declaration: `Tensor` -> `vector<Tensor>` (#5794 )	2025-12-26 19:02:28 +08:00
Daci	77add7d1cc	set tracelogger stacklevel=2 (#5766 )	2025-12-26 17:43:32 +08:00
kxz2002	cad2932990	[BugFix] Fix process_response_dict to support async in serving_completion (#5758 ) * support process_response_dict async initial commit * fixbug * add unit test * optimize	2025-12-26 17:40:58 +08:00
Ryan	724045c426	add some op infershape&dtype (#5762 )	2025-12-26 16:17:39 +08:00
kevin	894f4e312b	[FDConfig] disable chunked_mm_input in ernie5 (#5774 ) * disable chunked_mm_input in ernie5 * update code * update code * update test case * update testcase * upate case	2025-12-26 15:31:27 +08:00
周周周	03363cab4c	make flash_mask attention pybind (#5783 )	2025-12-26 14:31:35 +08:00
YuBaoku	8808dd1fed	[CI] Enable custom_device_check in CI rerun (#5786 ) * [CI] Enable custom_device_check in CI rerun	2025-12-26 14:09:16 +08:00
yzwu	7b6cc11952	[Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735 )	2025-12-26 14:01:27 +08:00
RichardWooSJTU	01c18f328f	rename need_block_num_signal (#5623 )	2025-12-26 11:02:29 +08:00
YuBaoku	4c22a5afb8	[CI] Disable GPU cleanup due to CI machine limitations (#5781 )	2025-12-26 00:11:06 +08:00
Yonghua Li	0c01cccc32	[BugFix] fix double shutdown of comm group when rank0 clears weights slower than other ranks (#5715 )	2025-12-25 21:48:53 +08:00
kevin	5538dda3c8	[Feature] pd support dy-c8 ipc (#5750 ) * pd support dy-c8 ipc * update code * support v0 * update code	2025-12-25 21:22:34 +08:00
kevin	4fa76296d9	[BugFix] fix mm splitwise scheduler bug (#5604 ) * fix mm splitwise scheduler bug * fix test case bug * update code * update code	2025-12-25 04:08:11 -08:00
ophilia-lee	d5f5dc4f6e	[Benchmark]调大aiohttp 默认读 buffer size至10M，解决streaming 返回块过大报Chunk too big问题 (#5771 ) * benchmark工具支持受限解码场景指定response_format * Update backend_request_func.py output.success判断兼容思考内容超长截断时回复内容为空的情况 * Update benchmark_serving.py 更新benchmark_metrics * 支持Completions接口 * 支持Completions接口 * 支持Completions接口 * [Benchmark]支持Completions接口 * [Benchmark]支持Completions接口 * [Benchmark]async_request_eb_openai_completions 调大aiohttp 默认读 buffer size至4M，解决streaming 返回块过大报Chunk too big问题 * [Benchmark]调大aiohttp 默认读 buffer size至10M，解决streaming 返回块过大报Chunk too big问题 --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-25 19:36:11 +08:00
Copilot	1cbf448178	[Feature] Add startup version check mechanism for Paddle (#5769 ) * Initial plan * 实现版本检查机制：添加get_version_info函数并在启动时检查Paddle版本 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * 修复代码审查反馈：改进错误处理和日志记录 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Change comments and warning messages from Chinese to English Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Update fastdeploy/__init__.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-25 19:29:04 +08:00
freeliuzc	9018ccf74e	[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 ) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register	2025-12-25 01:54:59 -08:00
YuBaoku	7247dc5f3a	[CI] Add retry and robust cleanup for removal (#5725 ) * [CI] Add retry and robust cleanup for removal * [CI] Ensure clean GPU memory by killing leftover processes	2025-12-25 17:08:27 +08:00
Juncai	412867fd99	[Feature] Support KV Cache Storage (#5571 ) * Support Mooncake Store * up * up * add op * fix conflict * fix error * up for comments * avoid thread lock * up * fix unittest * fix unittest * remove debug info * consider tp_size > 1 * add default rdma_nics * add utils * up * fix error --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-25 16:30:35 +08:00
memoryCoderC	be3be4913a	[Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM (#5195 ) * [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM * [Optimization] refactor(chat_handler,completion_handler): rename class	2025-12-25 16:28:15 +08:00
Jiaxin Sui	8fc789bb3f	[iluvatar][CI] refactor iluvatar_ci (#5588 ) * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * Update Docker image tag in iluvatar_test workflow * Update default Docker image version in workflow * Update iluvatar_test.yml * Update default Docker image in workflow config * Update model path in run_ernie300B_4layer.py * Update model path in offline inference check * Add model_data directory and copy model files Create model_data directory and copy necessary files. * Update run_ernie_vl_28B.py * Update run_ernie300B_4layer.py * Update paddlepaddle installation method in script * Change wget command to include proxy option * Modify paddle package installation in CI script Updated installation commands for paddle packages. * Update paddlepaddle and paddle-iluvatar-gpu versions * Delete .github/workflows/ci_iluvatar.yml * Rename workflow from ILUVATAR Test to ILUVATAR-CI * Update installation commands for paddlepaddle and iluvatar	2025-12-25 15:10:34 +08:00
qw86972190	135e47d551	[XPU]ZMQ logprob (#5628 ) * [XPU]ZMQ logprob	2025-12-25 14:50:01 +08:00
Yuanle Liu	75b3180280	[BugFix] Fix _disable_sequence_parallel_moe_if_needed (#5740 )	2025-12-24 20:02:22 -08:00
MingkunZhang	e48e306134	[Metax] update ci bash (#5760 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2025-12-25 11:47:38 +08:00
bukejiyu	f0bbdce849	[Loader]Fix bug in MTP weight loading (#5744 ) * fix torch mtp * fix * update	2025-12-25 11:32:17 +08:00
RuohengMa	e154c03416	[XPU] refine moe_expert_ffn ut (#5743 )	2025-12-25 10:35:24 +08:00
YuBaoku	9624bf3c6e	[CI] Fix image build to use the correct upstream artifacts	2025-12-24 22:44:34 +08:00
chenjian	b90a922f98	[Bug fix] Set enable_cache_output as false by default (#5751 )	2025-12-24 21:37:24 +08:00
YuBaoku	6e39f88ca0	[CI] Fix ci_image_update error of no depends	2025-12-24 21:28:38 +08:00
YuBaoku	0410c42a9a	[CI] Refactor RL tests to reuse stable_test (#5516 ) * [CI] Refactor RL tests to reuse stable_test	2025-12-24 19:18:00 +08:00
freeliuzc	2dc2ba49b5	[Speculative Decoding] Fix multistep MTP in splitewise-prefill mode (#5723 )	2025-12-24 02:45:54 -08:00
YuBaoku	e75f93d302	[CI] Refactor RL tests to reuse test_metrics (#5741 )	2025-12-24 17:08:40 +08:00
chen	c7ab32d154	check (#5736 )	2025-12-24 16:49:20 +08:00

1 2 3 4 5 ...

4304 Commits