Commit Graph

97 Commits

Author SHA1 Message Date
lizexu123 1f96028bea [BugFix] fix python3.12 v0_loader (#6132) 2026-01-21 16:12:11 +08:00
qwes5s5 b2a2e11551 [Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320)
* request disconnect

* request disconnect

* fix bug

* fix bug--amend

---------

Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>
2026-01-16 11:46:13 +08:00
fxyfxy777 4c92035f2d [Feature] Unify fp8 block_wise quant ops (#5991)
* quant stash

* blockwise_quant

* precommit

* rm tensor.cut

* tp ok

* add swiglu

* rm outdate code

* fix activate ut

* change baseline

* fix baseline error
2026-01-15 05:50:37 -08:00
lizexu123 6619298b50 【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007)
* update w4afp8

* build.sh ok

* support cuda_graph

* fix

* add test

* fix max_tokens_per_expert

* >=70

* fix

* compute_max_tokens_from_prefix_sum in w4afp8

* compute_max_tokens use cub
2026-01-15 19:18:42 +08:00
YuBaoku 2c17acd767 [CI] Adapt vl_model baseline changes due to Paddle update_2 (#6033) 2026-01-14 15:22:26 +08:00
xjkmfa 1aa7e82924 [ci case]Check the chunking of the chat interface (#5981)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* [ci case] add Chunk segmentation check

* [ci case] add Chunk segmentation check

* [ci case] add Chunk segmentation check

* [ci case] add Chunk segmentation check

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2026-01-12 16:36:13 +08:00
lizexu123 acdf0cd1d9 fix hadamard_block_size (#5888) 2026-01-06 14:12:14 +08:00
xjkmfa ed60b4da32 [CI case]Prompt logprob (#5835)
* [ci case]prompt_logprobs
2025-12-30 21:26:06 +08:00
lizexu123 44a13e4557 [Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757)
* support

* fix

* support w4afp8 v1_loader and v0_loader

* fix

* fix test

* fix test

* fix test

* fix moe.py

* add test_ernie_4_5_w4afp8

* add test

* delete tensor

* fix test

* fix

* add

* fix test
2025-12-30 14:11:52 +08:00
yzwu 7b6cc11952 [Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735) 2025-12-26 14:01:27 +08:00
Jiaxin Sui 8fc789bb3f [iluvatar][CI] refactor iluvatar_ci (#5588)
* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* Update Docker image tag in iluvatar_test workflow

* Update default Docker image version in workflow

* Update iluvatar_test.yml

* Update default Docker image in workflow config

* Update model path in run_ernie300B_4layer.py

* Update model path in offline inference check

* Add model_data directory and copy model files

Create model_data directory and copy necessary files.

* Update run_ernie_vl_28B.py

* Update run_ernie300B_4layer.py

* Update paddlepaddle installation method in script

* Change wget command to include proxy option

* Modify paddle package installation in CI script

Updated installation commands for paddle packages.

* Update paddlepaddle and paddle-iluvatar-gpu versions

* Delete .github/workflows/ci_iluvatar.yml

* Rename workflow from ILUVATAR Test to ILUVATAR-CI

* Update installation commands for paddlepaddle and iluvatar
2025-12-25 15:10:34 +08:00
YuBaoku e75f93d302 [CI] Refactor RL tests to reuse test_metrics (#5741) 2025-12-24 17:08:40 +08:00
YuBaoku 672620cdfe Revert "[CI] Adapt vl_model baseline changes due to Paddle update (#5576)" (#5732)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
This reverts commit 63fff8df70.
2025-12-24 11:59:27 +08:00
Divano c1aa66df02 Revert "[Optim] Remove limitation of number of kvcache blocks (#5612)" (#5702)
This reverts commit 9da89a374b.
2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun 9da89a374b [Optim] Remove limitation of number of kvcache blocks (#5612)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optim] Remove limitation of number of kvcache blocks

* Update fastdeploy/envs.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/worker/iluvatar_worker.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add docs

* Update fastdeploy/worker/worker_process.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix ci case

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-23 11:18:29 +08:00
YuBaoku fe55baae47 [CI] Fix unit_test error of unstable execution (#5660)
* [CI] Fix unit_test error of unstable execution
2025-12-19 22:59:53 +08:00
MingkunZhang 46d83be065 [Metax] update ci test (#5652) 2025-12-19 17:25:47 +08:00
yzwu ac013803f3 [Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555) 2025-12-18 02:14:25 -08:00
Yonghua Li 0c8c6369ed [Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415)
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports

* [fix] fix some bugs

* [fix] fix rdma port for cache manager/messager

* [fix] temporarily cancel port availability check to see if it can pass ci test

* [feat] simplify args for multi api server

* [fix] fix dp

* [fix] fix port for xpu

* [fix] add tests for ports post processing & fix ci

* [test] fix test_multi_api_server

* [fix] fix rdma_comm_ports args for multi_api_server

* [fix] fix test_common_engine

* [fix] fix test_cache_transfer_manager

* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER

* [fix] avoid api server from creating engine_args twice

* [fix] fix test_run_batch

* [fix] fix test_metrics

* [fix] fix splitwise connector init

* [test] add test_rdma_transfer and test_expert_service

* [fix] fix code syntax

* [fix] fix test_rdma_transfer and build wheel with rdma script
2025-12-17 15:50:42 +08:00
YuBaoku 5d2b16e6f3 [CI] Remove test_metrics.py due to incompatible forced merge (#5578)
* [CI] Remove test_metrics.py due to incompatible forced merge
2025-12-16 14:04:46 +08:00
YuBaoku 63fff8df70 [CI] Adapt vl_model baseline changes due to Paddle update (#5576) 2025-12-16 11:42:31 +08:00
MingkunZhang f32e331ef5 [Metax] add ci yaml (#5520)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-12 13:35:38 +08:00
luukunn fbc9bce1e9 [Feature]Optimization of Thinking Pattern Framework (#4302)
* add model status in vl

* add x1 parser

* add model_status

* fix parser

* fix parser

* fix parser

* fix parser

* Revert "fix parser"

This reverts commit 300f446d8a.

* fix parser

* fix

* fix

* fix

* fix

* fix parser

* fix unit test

* fix unit test

* add unit test

* fix

* fix

* add unit test

* fix unit test

* add unit test

* add unit test

* fix unit test

* fix unit test

* fix bug

* fix unit test

* x1 tool parser

* fix unit test

* fix unit test

* fix unit test

* fix n

* fix unit test

* add unit test

* add unit test

* remove pring
2025-12-10 16:17:06 +08:00
Echo-Nie 1b1bfab341 [CI] Add unittest (#5328)
* add test_worker_eplb

* remove tesnsor_wise_fp8

* add copyright
2025-12-09 19:19:42 +08:00
lizexu123 95eab9f9ee [Feature] support stop_token_ids (#5399)
* support stop_token_ids

* fix

* delete chinese

* support both

* delete print
2025-12-09 17:49:12 +08:00
YuBaoku dfeabee123 [CI] Allow occasional distributed worker exit_code (#5341) 2025-12-03 10:56:59 +08:00
YuBaoku 3e2c13d8c5 [CI] Disable queue state assertion temporarily (#5329) 2025-12-02 18:57:29 +08:00
Jiaxin Sui b0113cb0fc [XPU][CI] Change XPU CI Base Value (#5318)
* Add '小度' keyword to assertion in run_w4a8.py

* Add keywords to assertion in run_ep_online.py

* Add keywords to assertion in run_w4a8.py

* Update run_45T.py

* Update run_ep_online.py

* Refactor assertion for response content keywords

* Update run_w4a8.py

* Update run_w4a8.py
2025-12-01 21:02:09 +08:00
Jiaxin Sui b467e9dadc [XPU][CI]Change W4A8 Case Base Value (#5309) 2025-12-01 15:25:33 +08:00
ddchenhao66 fc88eebc32 [CI][XPU] add pd disaggregation (#5179)
* [CI][XPU] add pd disaggregation

* Clarify comments and install iproute2

Updated comments to clarify script purpose and added installation of iproute2.

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-11-28 10:44:27 +08:00
YuBaoku 6a6bf4ea24 [CI] Fix test streaming with stop str (#5275)
* [CI] add output for last_token in test_streaming_with_stop_str

* [CI] Adapt empty last_token check
2025-11-27 20:51:39 +08:00
Jiaxin Sui 5ff93d4998 [XPU][CI] change VL model to 28B-VL-thinking (#5169)
* Enhance run_ci_xpu.sh with caching and prefill options

* Update model path and configuration in run_ci_xpu.sh

* Add '北朝' keyword to assertion in run_45vl.py

* Enhance process termination logic in run_ci_xpu.sh

* Set timeout for CI_XPU job to 60 minutes

* Remove extra newline in stop_processes function
2025-11-24 16:50:18 +08:00
YuBaoku 98f1ab46a9 [CI] add output for last_token in test_streaming_with_stop_str (#5170)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-24 10:49:17 +08:00
chenjian 3ea1b44a58 [Optimization] Improve perf for fd response token with internal adapter (#4992)
* [Optimize] Improve perf for fd response token with internal adapter

* fix

* fix bug

* fix ci

* fix ci

* fix ci

* fix ci
2025-11-21 19:02:03 +08:00
Zhang Yulong be9541a97b [CI] add metrics case (#5115)
* add case

* add case
2025-11-19 11:50:12 +08:00
FocusLuo c2c1942db9 [INTEL_HPU] [CI] enabled fastdeploy PR testing (#4596)
* [INTEL HPU] added hpu ci work flow support

Signed-off-by: Luo, Focus <focus.luo@intel.com>

* [INTEL HPU] added run ci hpu test scripts

Signed-off-by: Luo, Focus <focus.luo@intel.com>

* [INTEL HPU] enabled HPU ernie test case

Signed-off-by: Luo, Focus <focus.luo@intel.com>

* [INTEL HPU] updated Intel Gaudi Readme with Warmup disable cmdline

Signed-off-by: Luo, Focus <focus.luo@intel.com>

* Modify paddlepaddle installation command

Updated paddlepaddle installation command to use a specific index URL.

* Update run_ci_hpu.sh

* Rename json directory to nlohmann_json

Rename extracted json directory to nlohmann_json.

* Update ci_hpu.yml

* Set pip global index URL to Tsinghua mirror

* Update CI workflow to use self-hosted runner and paths

* Update Docker image in CI workflow

* Modify HPU installation URLs in run_ci_hpu.sh

Updated the installation URL for paddle_intel_hpu and added paddlenlp_ops installation.

* Fix paddle_intel_hpu installation URL

Corrected the URL for paddle_intel_hpu wheel installation.

---------

Signed-off-by: Luo, Focus <focus.luo@intel.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-17 19:24:41 +08:00
plusNew001 7f94d77e08 [XPU][CI] fix ci case bug (#5084)
* Ignore markdown and text files in CI workflow

* Change GPU_ID to XPU_ID in run_ci_xpu.sh

* Change GPU_ID to XPU_ID in test configuration

* Change GPU_ID to XPU_ID for service port calculation

* Change GPU_ID to XPU_ID for device identification

* Change GPU_ID to XPU_ID in test_ep function

* Update run_w4a8.py

* Redirect stop_processes output to kill.log

Redirect output of stop_processes to kill.log to capture logs.

* Log server output for failed test cases

Added logging of server.log for failed tests.

* Add '-s' option to pytest commands in run_ci_xpu.sh

* Refactor assertion to validate multiple keywords

Updated assertion to check for multiple keywords in response.

* Fix assertany to assert any in run_45vl.py
2025-11-17 16:01:27 +08:00
plusNew001 0e819cd596 [CI][XPU] Optimize CI logs and variable names (#5025)
* Ignore markdown and text files in CI workflow

* Change GPU_ID to XPU_ID in run_ci_xpu.sh

* Change GPU_ID to XPU_ID in test configuration

* Change GPU_ID to XPU_ID for service port calculation

* Change GPU_ID to XPU_ID for device identification

* Change GPU_ID to XPU_ID in test_ep function

* Update run_w4a8.py

* Redirect stop_processes output to kill.log

Redirect output of stop_processes to kill.log to capture logs.

* Log server output for failed test cases

Added logging of server.log for failed tests.

* Add '-s' option to pytest commands in run_ci_xpu.sh
2025-11-14 19:35:35 +08:00
zccjjj 88da9d9788 [XPU] [CI] Change CI ep test from offline to online (#4885)
* change CI ep test from offline to online

* add ep all2all ci's changes, from offline to online

* change env var in ep-all2all ci test

* add expected response for ep8tp8 all2all

* Adapt to CI refactoring and support dual-concurrent code execution

* Adapt to CI refactoring and support dual-concurrent, second

* Explicitly specify the #port

* change the startup method of all2all

* Modify the command of all2all

* Update assertion to check multiple keywords

* Update assertion to check multiple keywords

* Update run_w4a8.py

* Update run_w4a8.py

---------

Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-13 16:15:45 +08:00
yzwu 76e60e98f8 [Iluvatar][CI] fix safetensors_rust.SafetensorError: framework paddle is invalid (#4972) 2025-11-12 14:13:40 +08:00
yzwu 3707af7a4f [Iluvatar] add vl into ci and support v1 loader (#4774) 2025-11-11 10:50:17 +08:00
Yuanle Liu 3dc0ffa46d [TSP] Support qwen3 moe tsp + cudagraph (#4871)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen3_moe tsp mode

* fix

* fix

* update

* update

* update

* fix

* support external_rmsnorm

* update

* fix
2025-11-10 23:37:51 +08:00
plusNew001 3665c283b5 [XPU] [CI]Change CI to multi-concurrency (#4866)
* Refactor GPU ID logic in CI workflow

Updated GPU ID assignment logic and removed unused port calculations.

* Refactor GPU device and port configuration

* Update engine_worker_queue_port calculation logic

* Refactor XPU_VISIBLE_DEVICES export logic

* Adjust service port based on GPU ID

* Adjust service HTTP port based on GPU ID

* Adjust service_http_port based on GPU_ID

* Add import for os module in run_45T.py

* Update run_45vl.py

* Import os module in run_w4a8.py

Added import for os module to use environment variables.

* Remove duplicate import of os module

* Remove duplicate import of os module

* Update run_45T.py

* Update run_w4a8.py

* fix bug

* fix bug

* Update run_w4a8.py

* Fix directory change command in run_ci_xpu.sh
2025-11-10 21:09:48 +08:00
plusNew001 0a3bc84f71 [XPU][CI]Update test assertion and base response value (#4907) 2025-11-10 11:44:54 +08:00
plusNew001 fa098383f6 [XPU][CI] Ci bug fix (#4889)
* Refactor test_45t by commenting out responses

Comment out base response variables and update assertion.

* Update run_w4a8.py

* Fix assertion syntax in run_45T.py
2025-11-07 17:50:11 +08:00
YuBaoku fa28745f19 [CI] Update ERNIE-4.5-VL baseline to adapt to MoE changes (#4867)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-06 22:02:10 +08:00
YuBaoku a139f8f3cb [CI] Optimize port cleanup logic (#4860) 2025-11-06 19:13:48 +08:00
plusNew001 fc8bef2c95 [XPU][CI]Change ci vl model to 28 b (#4764)
* Update XPU_VISIBLE_DEVICES and model parameters

* Update base response and adjust max tokens

* Implement process cleanup in CI workflow

Add process cleanup commands to prevent port conflicts

* Remove process cleanup commands from CI workflow

Removed old process cleanup commands to prevent port conflicts.
2025-11-06 14:12:23 +08:00
zhupengyang 2fd254e5b7 support ep+tp at op layer (#4688) 2025-11-05 11:15:57 +08:00
YuBaoku 722110a952 [CI] Refactor CE wheel upload for multiple target paths (#4790)
* [CI] Refactor CE wheel upload for multiple target paths

* [CI] fix test_streaming_with_stop_str error
2025-11-04 18:56:38 +08:00