FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
lizexu123	1f96028bea	[BugFix] fix python3.12 v0_loader (#6132 )	2026-01-21 16:12:11 +08:00
qwes5s5	b2a2e11551	[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320 ) * request disconnect * request disconnect * fix bug * fix bug--amend --------- Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>	2026-01-16 11:46:13 +08:00
fxyfxy777	4c92035f2d	[Feature] Unify fp8 block_wise quant ops (#5991 ) * quant stash * blockwise_quant * precommit * rm tensor.cut * tp ok * add swiglu * rm outdate code * fix activate ut * change baseline * fix baseline error	2026-01-15 05:50:37 -08:00
lizexu123	6619298b50	【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 ) * update w4afp8 * build.sh ok * support cuda_graph * fix * add test * fix max_tokens_per_expert * >=70 * fix * compute_max_tokens_from_prefix_sum in w4afp8 * compute_max_tokens use cub	2026-01-15 19:18:42 +08:00
YuBaoku	2c17acd767	[CI] Adapt vl_model baseline changes due to Paddle update_2 (#6033 )	2026-01-14 15:22:26 +08:00
xjkmfa	1aa7e82924	[ci case]Check the chunking of the chat interface (#5981 ) * Add ci case for min token and max token * 【CI case】include total_tokens in the last packet of completion interface stream output * [ci case] add Chunk segmentation check * [ci case] add Chunk segmentation check * [ci case] add Chunk segmentation check * [ci case] add Chunk segmentation check --------- Co-authored-by: xujing43 <xujing43@baidu.com>	2026-01-12 16:36:13 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
xjkmfa	ed60b4da32	[CI case]Prompt logprob (#5835 ) * [ci case]prompt_logprobs	2025-12-30 21:26:06 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
yzwu	7b6cc11952	[Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735 )	2025-12-26 14:01:27 +08:00
Jiaxin Sui	8fc789bb3f	[iluvatar][CI] refactor iluvatar_ci (#5588 ) * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * refactor iluvatar_ci * Update Docker image tag in iluvatar_test workflow * Update default Docker image version in workflow * Update iluvatar_test.yml * Update default Docker image in workflow config * Update model path in run_ernie300B_4layer.py * Update model path in offline inference check * Add model_data directory and copy model files Create model_data directory and copy necessary files. * Update run_ernie_vl_28B.py * Update run_ernie300B_4layer.py * Update paddlepaddle installation method in script * Change wget command to include proxy option * Modify paddle package installation in CI script Updated installation commands for paddle packages. * Update paddlepaddle and paddle-iluvatar-gpu versions * Delete .github/workflows/ci_iluvatar.yml * Rename workflow from ILUVATAR Test to ILUVATAR-CI * Update installation commands for paddlepaddle and iluvatar	2025-12-25 15:10:34 +08:00
YuBaoku	e75f93d302	[CI] Refactor RL tests to reuse test_metrics (#5741 )	2025-12-24 17:08:40 +08:00
YuBaoku	672620cdfe	Revert "[CI] Adapt vl_model baseline changes due to Paddle update (#5576 )" (#5732 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details This reverts commit `63fff8df70`.	2025-12-24 11:59:27 +08:00
Divano	c1aa66df02	Revert "[Optim] Remove limitation of number of kvcache blocks (#5612 )" (#5702 ) This reverts commit `9da89a374b`.	2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun	9da89a374b	[Optim] Remove limitation of number of kvcache blocks (#5612 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Optim] Remove limitation of number of kvcache blocks * Update fastdeploy/envs.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/worker/iluvatar_worker.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add docs * Update fastdeploy/worker/worker_process.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix ci case --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-23 11:18:29 +08:00
YuBaoku	fe55baae47	[CI] Fix unit_test error of unstable execution (#5660 ) * [CI] Fix unit_test error of unstable execution	2025-12-19 22:59:53 +08:00
MingkunZhang	46d83be065	[Metax] update ci test (#5652 )	2025-12-19 17:25:47 +08:00
yzwu	ac013803f3	[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555 )	2025-12-18 02:14:25 -08:00
Yonghua Li	0c8c6369ed	[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 ) * [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports * [fix] fix some bugs * [fix] fix rdma port for cache manager/messager * [fix] temporarily cancel port availability check to see if it can pass ci test * [feat] simplify args for multi api server * [fix] fix dp * [fix] fix port for xpu * [fix] add tests for ports post processing & fix ci * [test] fix test_multi_api_server * [fix] fix rdma_comm_ports args for multi_api_server * [fix] fix test_common_engine * [fix] fix test_cache_transfer_manager * [chore] automatically setting FD_ENABLE_MULTI_API_SERVER * [fix] avoid api server from creating engine_args twice * [fix] fix test_run_batch * [fix] fix test_metrics * [fix] fix splitwise connector init * [test] add test_rdma_transfer and test_expert_service * [fix] fix code syntax * [fix] fix test_rdma_transfer and build wheel with rdma script	2025-12-17 15:50:42 +08:00
YuBaoku	5d2b16e6f3	[CI] Remove test_metrics.py due to incompatible forced merge (#5578 ) * [CI] Remove test_metrics.py due to incompatible forced merge	2025-12-16 14:04:46 +08:00
YuBaoku	63fff8df70	[CI] Adapt vl_model baseline changes due to Paddle update (#5576 )	2025-12-16 11:42:31 +08:00
MingkunZhang	f32e331ef5	[Metax] add ci yaml (#5520 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-12 13:35:38 +08:00
luukunn	fbc9bce1e9	[Feature]Optimization of Thinking Pattern Framework (#4302 ) * add model status in vl * add x1 parser * add model_status * fix parser * fix parser * fix parser * fix parser * Revert "fix parser" This reverts commit `300f446d8a`. * fix parser * fix * fix * fix * fix * fix parser * fix unit test * fix unit test * add unit test * fix * fix * add unit test * fix unit test * add unit test * add unit test * fix unit test * fix unit test * fix bug * fix unit test * x1 tool parser * fix unit test * fix unit test * fix unit test * fix n * fix unit test * add unit test * add unit test * remove pring	2025-12-10 16:17:06 +08:00
Echo-Nie	1b1bfab341	[CI] Add unittest (#5328 ) * add test_worker_eplb * remove tesnsor_wise_fp8 * add copyright	2025-12-09 19:19:42 +08:00
lizexu123	95eab9f9ee	[Feature] support stop_token_ids (#5399 ) * support stop_token_ids * fix * delete chinese * support both * delete print	2025-12-09 17:49:12 +08:00
YuBaoku	dfeabee123	[CI] Allow occasional distributed worker exit_code (#5341 )	2025-12-03 10:56:59 +08:00
YuBaoku	3e2c13d8c5	[CI] Disable queue state assertion temporarily (#5329 )	2025-12-02 18:57:29 +08:00
Jiaxin Sui	b0113cb0fc	[XPU][CI] Change XPU CI Base Value (#5318 ) * Add '小度' keyword to assertion in run_w4a8.py * Add keywords to assertion in run_ep_online.py * Add keywords to assertion in run_w4a8.py * Update run_45T.py * Update run_ep_online.py * Refactor assertion for response content keywords * Update run_w4a8.py * Update run_w4a8.py	2025-12-01 21:02:09 +08:00
Jiaxin Sui	b467e9dadc	[XPU][CI]Change W4A8 Case Base Value (#5309 )	2025-12-01 15:25:33 +08:00
ddchenhao66	fc88eebc32	[CI][XPU] add pd disaggregation (#5179 ) * [CI][XPU] add pd disaggregation * Clarify comments and install iproute2 Updated comments to clarify script purpose and added installation of iproute2. --------- Co-authored-by: ddchenhao66 <dhaochen163.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-11-28 10:44:27 +08:00
YuBaoku	6a6bf4ea24	[CI] Fix test streaming with stop str (#5275 ) * [CI] add output for last_token in test_streaming_with_stop_str * [CI] Adapt empty last_token check	2025-11-27 20:51:39 +08:00
Jiaxin Sui	5ff93d4998	[XPU][CI] change VL model to 28B-VL-thinking (#5169 ) * Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function	2025-11-24 16:50:18 +08:00
YuBaoku	98f1ab46a9	[CI] add output for last_token in test_streaming_with_stop_str (#5170 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-11-24 10:49:17 +08:00
chenjian	3ea1b44a58	[Optimization] Improve perf for fd response token with internal adapter (#4992 ) * [Optimize] Improve perf for fd response token with internal adapter * fix * fix bug * fix ci * fix ci * fix ci * fix ci	2025-11-21 19:02:03 +08:00
Zhang Yulong	be9541a97b	[CI] add metrics case (#5115 ) * add case * add case	2025-11-19 11:50:12 +08:00
FocusLuo	c2c1942db9	[INTEL_HPU] [CI] enabled fastdeploy PR testing (#4596 ) * [INTEL HPU] added hpu ci work flow support Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] added run ci hpu test scripts Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] enabled HPU ernie test case Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] updated Intel Gaudi Readme with Warmup disable cmdline Signed-off-by: Luo, Focus <focus.luo@intel.com> * Modify paddlepaddle installation command Updated paddlepaddle installation command to use a specific index URL. * Update run_ci_hpu.sh * Rename json directory to nlohmann_json Rename extracted json directory to nlohmann_json. * Update ci_hpu.yml * Set pip global index URL to Tsinghua mirror * Update CI workflow to use self-hosted runner and paths * Update Docker image in CI workflow * Modify HPU installation URLs in run_ci_hpu.sh Updated the installation URL for paddle_intel_hpu and added paddlenlp_ops installation. * Fix paddle_intel_hpu installation URL Corrected the URL for paddle_intel_hpu wheel installation. --------- Signed-off-by: Luo, Focus <focus.luo@intel.com> Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>	2025-11-17 19:24:41 +08:00
plusNew001	7f94d77e08	[XPU][CI] fix ci case bug (#5084 ) * Ignore markdown and text files in CI workflow * Change GPU_ID to XPU_ID in run_ci_xpu.sh * Change GPU_ID to XPU_ID in test configuration * Change GPU_ID to XPU_ID for service port calculation * Change GPU_ID to XPU_ID for device identification * Change GPU_ID to XPU_ID in test_ep function * Update run_w4a8.py * Redirect stop_processes output to kill.log Redirect output of stop_processes to kill.log to capture logs. * Log server output for failed test cases Added logging of server.log for failed tests. * Add '-s' option to pytest commands in run_ci_xpu.sh * Refactor assertion to validate multiple keywords Updated assertion to check for multiple keywords in response. * Fix assertany to assert any in run_45vl.py	2025-11-17 16:01:27 +08:00
plusNew001	0e819cd596	[CI][XPU] Optimize CI logs and variable names (#5025 ) * Ignore markdown and text files in CI workflow * Change GPU_ID to XPU_ID in run_ci_xpu.sh * Change GPU_ID to XPU_ID in test configuration * Change GPU_ID to XPU_ID for service port calculation * Change GPU_ID to XPU_ID for device identification * Change GPU_ID to XPU_ID in test_ep function * Update run_w4a8.py * Redirect stop_processes output to kill.log Redirect output of stop_processes to kill.log to capture logs. * Log server output for failed test cases Added logging of server.log for failed tests. * Add '-s' option to pytest commands in run_ci_xpu.sh	2025-11-14 19:35:35 +08:00
zccjjj	88da9d9788	[XPU] [CI] Change CI ep test from offline to online (#4885 ) * change CI ep test from offline to online * add ep all2all ci's changes, from offline to online * change env var in ep-all2all ci test * add expected response for ep8tp8 all2all * Adapt to CI refactoring and support dual-concurrent code execution * Adapt to CI refactoring and support dual-concurrent, second * Explicitly specify the #port * change the startup method of all2all * Modify the command of all2all * Update assertion to check multiple keywords * Update assertion to check multiple keywords * Update run_w4a8.py * Update run_w4a8.py --------- Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>	2025-11-13 16:15:45 +08:00
yzwu	76e60e98f8	[Iluvatar][CI] fix safetensors_rust.SafetensorError: framework paddle is invalid (#4972 )	2025-11-12 14:13:40 +08:00
yzwu	3707af7a4f	[Iluvatar] add vl into ci and support v1 loader (#4774 )	2025-11-11 10:50:17 +08:00
Yuanle Liu	3dc0ffa46d	[TSP] Support qwen3 moe tsp + cudagraph (#4871 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support qwen3_moe tsp mode * fix * fix * update * update * update * fix * support external_rmsnorm * update * fix	2025-11-10 23:37:51 +08:00
plusNew001	3665c283b5	[XPU] [CI]Change CI to multi-concurrency (#4866 ) * Refactor GPU ID logic in CI workflow Updated GPU ID assignment logic and removed unused port calculations. * Refactor GPU device and port configuration * Update engine_worker_queue_port calculation logic * Refactor XPU_VISIBLE_DEVICES export logic * Adjust service port based on GPU ID * Adjust service HTTP port based on GPU ID * Adjust service_http_port based on GPU_ID * Add import for os module in run_45T.py * Update run_45vl.py * Import os module in run_w4a8.py Added import for os module to use environment variables. * Remove duplicate import of os module * Remove duplicate import of os module * Update run_45T.py * Update run_w4a8.py * fix bug * fix bug * Update run_w4a8.py * Fix directory change command in run_ci_xpu.sh	2025-11-10 21:09:48 +08:00
plusNew001	0a3bc84f71	[XPU][CI]Update test assertion and base response value (#4907 )	2025-11-10 11:44:54 +08:00
plusNew001	fa098383f6	[XPU][CI] Ci bug fix (#4889 ) * Refactor test_45t by commenting out responses Comment out base response variables and update assertion. * Update run_w4a8.py * Fix assertion syntax in run_45T.py	2025-11-07 17:50:11 +08:00
YuBaoku	fa28745f19	[CI] Update ERNIE-4.5-VL baseline to adapt to MoE changes (#4867 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-11-06 22:02:10 +08:00
YuBaoku	a139f8f3cb	[CI] Optimize port cleanup logic (#4860 )	2025-11-06 19:13:48 +08:00
plusNew001	fc8bef2c95	[XPU][CI]Change ci vl model to 28 b (#4764 ) * Update XPU_VISIBLE_DEVICES and model parameters * Update base response and adjust max tokens * Implement process cleanup in CI workflow Add process cleanup commands to prevent port conflicts * Remove process cleanup commands from CI workflow Removed old process cleanup commands to prevent port conflicts.	2025-11-06 14:12:23 +08:00
zhupengyang	2fd254e5b7	support ep+tp at op layer (#4688 )	2025-11-05 11:15:57 +08:00
YuBaoku	722110a952	[CI] Refactor CE wheel upload for multiple target paths (#4790 ) * [CI] Refactor CE wheel upload for multiple target paths * [CI] fix test_streaming_with_stop_str error	2025-11-04 18:56:38 +08:00

1 2

97 Commits