FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-10 17:41:13 +08:00

Author	SHA1	Message	Date
周周周	7a0744f05a	[UT]support attention test tp (#5887 )	2026-01-06 11:15:01 +08:00
Jiaxin Sui	2785b820c8	[XPU][CI] Add XPU logprobs case (#5874 ) * Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function * Update paddlepaddle-xpu installation command Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error. * Update PaddlePaddle installation command * Remove max_tokens from model response configuration Removed max_tokens parameter from the model response call. * add xpu logprobs case * Fix formatting and improve setup_logprobs_env Add newline at end of file and update setup_logprobs_env function. * Refactor test_logprobs_21b_tp4.py for clarity * Change top_p value from 1.0 to 0 --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>	2026-01-05 19:01:14 +08:00
jc	8d384f9fd8	[PD Disaggregation] Update usage of pd disaggregation and data parallel (#5742 ) * Update usage of pd disaggregation * up * up * up * up * up * up * up * up * up * up dp docs * up * up * up * fix unittest	2026-01-05 17:51:29 +08:00
jc	e911ac2ce7	[BugFix] Refine the preparation of cpu and storage cache (#5777 ) * Refine the preparation of cpu and storage cache * fix error * fix error * up * fix * up docs * fix unittest * remove debug info	2026-01-05 10:13:30 +08:00
kevin	52dc9a7b85	[BugFix] skip mm revert (#5848 ) * skip mm revert * update code * update test	2026-01-04 14:25:45 +08:00
周周周	e3957a5ebc	[Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620 )	2026-01-04 11:21:15 +08:00
GoldPancake	4e10ae5d99	[Speculative Decoding] Optimize draft logprob (#5842 ) * optimize draft logprob * fix ut	2025-12-31 13:35:56 +08:00
xjkmfa	ed60b4da32	[CI case]Prompt logprob (#5835 ) * [ci case]prompt_logprobs	2025-12-30 21:26:06 +08:00
essos	b03a4f3e3d	[CI]【Hackathon 9th Sprint No.46】NO.46 功能模块 fastdeploy/model_executor/guided_decoding/xgrammar_backend.py 单测补充 (#5042 ) * test * rename ut * remove test max_rollback_tokens * update * 精简代码 * fix: torch use mock --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-30 17:05:26 +08:00
chen	0bcf924e10	[Optimization] Optimization for gather_logprob by 10GB (#5817 ) * opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k	2025-12-30 15:33:34 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
GoldPancake	e78e22ebd5	[BugFix] Fix entropy bugs (#5818 ) * fix entropy bugs * fix ut * fix	2025-12-29 20:44:29 -08:00
周周周	7ae13b2326	[PD Disaggregation]remove unsed para in RDMACommManager (#5814 )	2025-12-30 11:38:30 +08:00
Yonghua Li	a8d3e3ba12	[BugFix] fix shm opened but not closed in set_data_ipc (#5826 )	2025-12-29 23:35:07 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
essos	ffb3ccff74	[CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 (#5047 ) * add test * update test * 精简代码 * 去除 mock --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:44:56 +08:00
xunyoyo	7e39560a42	[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new (#5726 ) * Add cache messager coverage tests * Add default_dtype parameter to test cache manager --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:42:27 +08:00
essos	8ee055aafc	[CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 (#5050 ) * Add comprehensive unit tests for data type conversion functionality * fix * Fix unit test failures in test_local_scheduler.py * update * fix code * update mock * add ut * rm file * update test * 删除已覆盖的测试用例 --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 12:41:50 +08:00
ddchenhao66	56a9ecccb2	[XPU] xpu support ep4tp4 (#5773 ) * [XPU] xpu support ep4tp4 * Add commands to check multiprocessing and fastdeploy processes --------- Co-authored-by: ddchenhao66 <dhaochen163.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-29 11:27:01 +08:00
YuBaoku	c3ccfa974c	[CI] Fix path error and port conflict (#5803 )	2025-12-27 12:50:58 +08:00
kxz2002	cad2932990	[BugFix] Fix process_response_dict to support async in serving_completion (#5758 ) * support process_response_dict async initial commit * fixbug * add unit test * optimize	2025-12-26 17:40:58 +08:00
kevin	894f4e312b	[FDConfig] disable chunked_mm_input in ernie5 (#5774 ) * disable chunked_mm_input in ernie5 * update code * update code * update test case * update testcase * upate case	2025-12-26 15:31:27 +08:00
yzwu	7b6cc11952	[Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735 )	2025-12-26 14:01:27 +08:00
YuBaoku	4c22a5afb8	[CI] Disable GPU cleanup due to CI machine limitations (#5781 )	2025-12-26 00:11:06 +08:00
kevin	4fa76296d9	[BugFix] fix mm splitwise scheduler bug (#5604 ) * fix mm splitwise scheduler bug * fix test case bug * update code * update code	2025-12-25 04:08:11 -08:00
Copilot	1cbf448178	[Feature] Add startup version check mechanism for Paddle (#5769 ) * Initial plan * 实现版本检查机制：添加get_version_info函数并在启动时检查Paddle版本 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * 修复代码审查反馈：改进错误处理和日志记录 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Change comments and warning messages from Chinese to English Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Update fastdeploy/__init__.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-25 19:29:04 +08:00
freeliuzc	9018ccf74e	[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 ) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register	2025-12-25 01:54:59 -08:00
YuBaoku	7247dc5f3a	[CI] Add retry and robust cleanup for removal (#5725 ) * [CI] Add retry and robust cleanup for removal * [CI] Ensure clean GPU memory by killing leftover processes	2025-12-25 17:08:27 +08:00
Juncai	412867fd99	[Feature] Support KV Cache Storage (#5571 ) * Support Mooncake Store * up * up * add op * fix conflict * fix error * up for comments * avoid thread lock * up * fix unittest * fix unittest * remove debug info * consider tp_size > 1 * add default rdma_nics * add utils * up * fix error --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-25 16:30:35 +08:00
memoryCoderC	be3be4913a	[Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM (#5195 ) * [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM * [Optimization] refactor(chat_handler,completion_handler): rename class	2025-12-25 16:28:15 +08:00
Jiaxin Sui	8fc789bb3f	[iluvatar][CI] refactor iluvatar_ci (#5588 ) * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * Update Docker image tag in iluvatar_test workflow * Update default Docker image version in workflow * Update iluvatar_test.yml * Update default Docker image in workflow config * Update model path in run_ernie300B_4layer.py * Update model path in offline inference check * Add model_data directory and copy model files Create model_data directory and copy necessary files. * Update run_ernie_vl_28B.py * Update run_ernie300B_4layer.py * Update paddlepaddle installation method in script * Change wget command to include proxy option * Modify paddle package installation in CI script Updated installation commands for paddle packages. * Update paddlepaddle and paddle-iluvatar-gpu versions * Delete .github/workflows/ci_iluvatar.yml * Rename workflow from ILUVATAR Test to ILUVATAR-CI * Update installation commands for paddlepaddle and iluvatar	2025-12-25 15:10:34 +08:00
YuBaoku	0410c42a9a	[CI] Refactor RL tests to reuse stable_test (#5516 ) * [CI] Refactor RL tests to reuse stable_test	2025-12-24 19:18:00 +08:00
YuBaoku	e75f93d302	[CI] Refactor RL tests to reuse test_metrics (#5741 )	2025-12-24 17:08:40 +08:00
Divano	6b0fba8294	Update run.sh	2025-12-24 15:35:17 +08:00
Nyakku Shigure	11227e00bb	[GraphOptimization] Wrap deep gemm and triton as python op (#5673 ) * [GraphOptimization] Wrap deep gemm and triton as python op * add unitest to _base_test && compatibility * paddle.static.MetaTensor -> "paddle.static.MetaTensor" * mv register_custom_python_op * rename yaml --------- Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com>	2025-12-24 15:23:46 +08:00
bukejiyu	ba4b7afb3a	[Others] Rename tensor_parallel_degree to tensor_model_parallel_size for paddleformers 0.4.1 (#5727 )	2025-12-23 23:19:11 -08:00
xunyoyo	8acdd9f156	[CI] 【Hackathon 9th Sprint No.41】NO.41 功能模块单测补充 -new Add splitwise connector tests Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-24 14:05:32 +08:00
YuBaoku	672620cdfe	Revert "[CI] Adapt vl_model baseline changes due to Paddle update (#5576 )" (#5732 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details This reverts commit `63fff8df70`.	2025-12-24 11:59:27 +08:00
GoldPancake	23d488c488	[Feature] Entropy calculation support (#5692 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * support entropy * fix bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-23 21:19:47 +08:00
bukejiyu	d1c6e57341	[Others] upgrade paddleformer to 0.4.0 (#5599 )	2025-12-23 05:08:01 -08:00
ophilia-lee	99258e19c8	[Benchmark]支持Completions接口 (#5700 ) * benchmark工具支持受限解码场景指定response_format * Update backend_request_func.py output.success判断兼容思考内容超长截断时回复内容为空的情况 * Update benchmark_serving.py 更新benchmark_metrics * 支持Completions接口 * 支持Completions接口 * 支持Completions接口 * [Benchmark]支持Completions接口 * [Benchmark]支持Completions接口 --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-23 19:46:23 +08:00
kesmeey	f15edbb6ef	[CI]【Hackathon 9th Sprint No.40】功能模块 fastdeploy/entrypoints/openai/api_server.py 单测补充 (#5567 ) * Add tests for openai api_server coverage * update * Update tests for openai api_server * fix bugs * test: disable some api_server lifespan/controller tests for local env * Format test_api_server with black * update * update * test: narrow envs patch in api_server tests to avoid side effects * fix: separate MagicMock creation to avoid missing req argument * fix: patch TRACES_ENABLE env var in api_server tests * fix: use os.environ patch for TRACES_ENABLE * test: use fake fastdeploy.envs in api_server tests * test: pass fake Request into chat/completion routes * test: increase coverage for tracing and scheduler control * fix: set dynamic_load_weight in tracing headers test * ci: add retry and validation for FastDeploy.tar.gz download * ci: fix indentation in _base_test.yml * refactor: simplify test_api_server.py (807->480 lines, ~40% reduction) * fix: restore missing args attributes (revision, etc.) in _build_args * fix: patch sys.argv to prevent SystemExit: 2 in api_server tests * improve coverage * Remove docstring from test_api_server.py Removed unnecessary docstring from test_api_server.py --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-23 18:06:43 +08:00
Divano	c1aa66df02	Revert "[Optim] Remove limitation of number of kvcache blocks (#5612 )" (#5702 ) This reverts commit `9da89a374b`.	2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun	9da89a374b	[Optim] Remove limitation of number of kvcache blocks (#5612 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Optim] Remove limitation of number of kvcache blocks * Update fastdeploy/envs.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/worker/iluvatar_worker.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add docs * Update fastdeploy/worker/worker_process.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix ci case --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-23 11:18:29 +08:00
xunyoyo	3aee5c4bf5	[CI] 【Hackathon 9th Sprint No.37】NO.37 功能模块单测补充 (#5059 ) * Add unit tests for TokenProcessor functionality * Add trace stubs for token processor tests * Increase token processor test coverage * Clean up imports in test_token_processor.py Remove unnecessary path manipulation in test file. * Cleanup: Remove unused imports in test_token_processor Removed unused imports from the test file. * Add trace_carrier to task in test cases Added trace_carrier attribute to task in multiple test cases to ensure proper handling of trace information. * Refine token processor tests for safe coverage * Expand postprocess coverage * Add ZMQ logprob parsing test --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com> Co-authored-by: Tao Luo <luotao02@baidu.com>	2025-12-23 10:35:16 +08:00
Jiaxin Sui	f16077a939	[XPU][CI] Xpu ci update (#5690 ) * Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function * Update paddlepaddle-xpu installation command Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error. * Update PaddlePaddle installation command * Remove max_tokens from model response configuration Removed max_tokens parameter from the model response call.	2025-12-23 10:19:39 +08:00
xiaolei373	dfe8ea941c	[log]console log to llm log (#5680 )	2025-12-23 10:05:45 +08:00
ddchenhao66	a1535c7e7e	[XPU][CI] xpu add ci test for pd + TP2 (#5653 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2025-12-22 19:27:10 +08:00
lizexu123	6d323769dd	fix w4afp8 (#5634 )	2025-12-22 13:39:41 +08:00
YuBaoku	fe55baae47	[CI] Fix unit_test error of unstable execution (#5660 ) * [CI] Fix unit_test error of unstable execution	2025-12-19 22:59:53 +08:00

1 2 3 4 5 ...

593 Commits