Commit Graph

116 Commits

Author SHA1 Message Date
chenjian 6727df8286 [Optimization] Optimize ttft for prefill pd (#6680)
* optimize ttft

* fix

* fix

* fix ci

* fix ci

* fix

* fix bug

* fix

* add comments

* fix ci

* fix

* fix ci

* fix format

* update according to review

* add comment

* fix

* fix format
2026-03-30 20:36:23 +08:00
yzwu 8789329457 [Iluvatar] Support wi4a16 group_gemm (#7078) 2026-03-30 19:03:51 +08:00
YuBaoku aee293be0f [CI] Optimize: add vl swap_test and remove useless code (#7000) 2026-03-25 11:33:56 +08:00
fxyfxy777 250ce40b40 [Feature] use phi permute/unpermute & rm swiglu (#6361)
* tp文字输出正常

* B eb5 mini文字输出正常

* eb5mini ep B卡 文字输出正常

* default use phi moe op

* stash

* tp H卡正常

* ep ok

* rm debug

* rm debug tool

* rm del ffn_out

* rm swiglu

* add envs to swiglu

* merge dev

* fix ci baseline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix ci baseline 2

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 02:01:57 -07:00
bukejiyu cffa8c246c [Others]update paddleformer 1.0.0 (#6496)
* update paddleformer 1.0.0

* update
2026-03-11 15:06:29 +08:00
yzwu 67388ce2f3 [Iluvatar][CI] Replace ci in ernie-300B-4layer with ernie-21b. (#6747) 2026-03-10 17:25:52 +08:00
YuBaoku cbfdf42628 [CI] Add test_dynamic_c8_cache.py and latest FastDeploy.tar.gz upload (#6708) 2026-03-08 16:01:12 +08:00
yzwu 81acdb62bd [Iluvatar][CI] Do not specify FD_LOG_DIR (#6665) 2026-03-06 11:54:44 +08:00
yzwu 6674131b0b [Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553) 2026-03-02 14:07:17 +08:00
YuBaoku bb51829bd5 [CI] Fix tests and docs to resolve failure (#6572) 2026-03-01 12:33:01 +08:00
YuBaoku fa8a2e32c8 [CI] Add test for prefix caching L2 swap (#6507) 2026-02-25 19:56:01 +08:00
chenjian 35c24f3f71 Revert "[Optimize] Optimize ttft for ep (#6098)" (#6402)
This reverts commit 90db0bdd0d.
2026-02-09 19:01:23 +08:00
xjkmfa 74762b0fb2 [ci case]Prompt logprobs precision (#6381)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2026-02-09 11:42:36 +08:00
GoldPancake 183b8d325a [RL] Support GLM MTP RL Model (#6267) 2026-02-04 20:14:35 +08:00
chenjian 90db0bdd0d [Optimize] Optimize ttft for ep (#6098)
* optimize ttft

* fix

* fix

* fix ci

* fix ci

* fix

* fix bug

* fix

* add comments

* fix ci

* fix
2026-02-04 15:03:29 +08:00
ddchenhao66 faade7d0ab [BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled (#6309) 2026-02-03 19:49:01 +08:00
GoldPancake fb374238e1 Revert "[RL] Support GLM MTP RL Model (#6223)" (#6301)
This reverts commit af6c84d48d.
2026-02-02 14:08:13 +08:00
chenjian 292bab7e6d [BugFix] Fix bug for enable output caching (#6226)
* [BugFix] Fix bug for enable output caching

* fix

* Fix

* fix

* fix ci
2026-01-30 10:55:36 +08:00
GoldPancake af6c84d48d [RL] Support GLM MTP RL Model (#6223)
* support glm mtp rl model

* fix

* fix

* fix ut

* update baseline
2026-01-28 08:28:03 -08:00
lizexu123 1f96028bea [BugFix] fix python3.12 v0_loader (#6132) 2026-01-21 16:12:11 +08:00
qwes5s5 b2a2e11551 [Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320)
* request disconnect

* request disconnect

* fix bug

* fix bug--amend

---------

Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>
2026-01-16 11:46:13 +08:00
fxyfxy777 4c92035f2d [Feature] Unify fp8 block_wise quant ops (#5991)
* quant stash

* blockwise_quant

* precommit

* rm tensor.cut

* tp ok

* add swiglu

* rm outdate code

* fix activate ut

* change baseline

* fix baseline error
2026-01-15 05:50:37 -08:00
lizexu123 6619298b50 【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007)
* update w4afp8

* build.sh ok

* support cuda_graph

* fix

* add test

* fix max_tokens_per_expert

* >=70

* fix

* compute_max_tokens_from_prefix_sum in w4afp8

* compute_max_tokens use cub
2026-01-15 19:18:42 +08:00
YuBaoku 2c17acd767 [CI] Adapt vl_model baseline changes due to Paddle update_2 (#6033) 2026-01-14 15:22:26 +08:00
xjkmfa 1aa7e82924 [ci case]Check the chunking of the chat interface (#5981)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* [ci case] add Chunk segmentation check

* [ci case] add Chunk segmentation check

* [ci case] add Chunk segmentation check

* [ci case] add Chunk segmentation check

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2026-01-12 16:36:13 +08:00
lizexu123 acdf0cd1d9 fix hadamard_block_size (#5888) 2026-01-06 14:12:14 +08:00
xjkmfa ed60b4da32 [CI case]Prompt logprob (#5835)
* [ci case]prompt_logprobs
2025-12-30 21:26:06 +08:00
lizexu123 44a13e4557 [Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757)
* support

* fix

* support w4afp8 v1_loader and v0_loader

* fix

* fix test

* fix test

* fix test

* fix moe.py

* add test_ernie_4_5_w4afp8

* add test

* delete tensor

* fix test

* fix

* add

* fix test
2025-12-30 14:11:52 +08:00
yzwu 7b6cc11952 [Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735) 2025-12-26 14:01:27 +08:00
Jiaxin Sui 8fc789bb3f [iluvatar][CI] refactor iluvatar_ci (#5588)
* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* Update Docker image tag in iluvatar_test workflow

* Update default Docker image version in workflow

* Update iluvatar_test.yml

* Update default Docker image in workflow config

* Update model path in run_ernie300B_4layer.py

* Update model path in offline inference check

* Add model_data directory and copy model files

Create model_data directory and copy necessary files.

* Update run_ernie_vl_28B.py

* Update run_ernie300B_4layer.py

* Update paddlepaddle installation method in script

* Change wget command to include proxy option

* Modify paddle package installation in CI script

Updated installation commands for paddle packages.

* Update paddlepaddle and paddle-iluvatar-gpu versions

* Delete .github/workflows/ci_iluvatar.yml

* Rename workflow from ILUVATAR Test to ILUVATAR-CI

* Update installation commands for paddlepaddle and iluvatar
2025-12-25 15:10:34 +08:00
YuBaoku e75f93d302 [CI] Refactor RL tests to reuse test_metrics (#5741) 2025-12-24 17:08:40 +08:00
YuBaoku 672620cdfe Revert "[CI] Adapt vl_model baseline changes due to Paddle update (#5576)" (#5732)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
This reverts commit 63fff8df70.
2025-12-24 11:59:27 +08:00
Divano c1aa66df02 Revert "[Optim] Remove limitation of number of kvcache blocks (#5612)" (#5702)
This reverts commit 9da89a374b.
2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun 9da89a374b [Optim] Remove limitation of number of kvcache blocks (#5612)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optim] Remove limitation of number of kvcache blocks

* Update fastdeploy/envs.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/worker/iluvatar_worker.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add docs

* Update fastdeploy/worker/worker_process.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix ci case

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-23 11:18:29 +08:00
YuBaoku fe55baae47 [CI] Fix unit_test error of unstable execution (#5660)
* [CI] Fix unit_test error of unstable execution
2025-12-19 22:59:53 +08:00
MingkunZhang 46d83be065 [Metax] update ci test (#5652) 2025-12-19 17:25:47 +08:00
yzwu ac013803f3 [Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555) 2025-12-18 02:14:25 -08:00
Yonghua Li 0c8c6369ed [Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415)
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports

* [fix] fix some bugs

* [fix] fix rdma port for cache manager/messager

* [fix] temporarily cancel port availability check to see if it can pass ci test

* [feat] simplify args for multi api server

* [fix] fix dp

* [fix] fix port for xpu

* [fix] add tests for ports post processing & fix ci

* [test] fix test_multi_api_server

* [fix] fix rdma_comm_ports args for multi_api_server

* [fix] fix test_common_engine

* [fix] fix test_cache_transfer_manager

* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER

* [fix] avoid api server from creating engine_args twice

* [fix] fix test_run_batch

* [fix] fix test_metrics

* [fix] fix splitwise connector init

* [test] add test_rdma_transfer and test_expert_service

* [fix] fix code syntax

* [fix] fix test_rdma_transfer and build wheel with rdma script
2025-12-17 15:50:42 +08:00
YuBaoku 5d2b16e6f3 [CI] Remove test_metrics.py due to incompatible forced merge (#5578)
* [CI] Remove test_metrics.py due to incompatible forced merge
2025-12-16 14:04:46 +08:00
YuBaoku 63fff8df70 [CI] Adapt vl_model baseline changes due to Paddle update (#5576) 2025-12-16 11:42:31 +08:00
MingkunZhang f32e331ef5 [Metax] add ci yaml (#5520)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-12 13:35:38 +08:00
luukunn fbc9bce1e9 [Feature]Optimization of Thinking Pattern Framework (#4302)
* add model status in vl

* add x1 parser

* add model_status

* fix parser

* fix parser

* fix parser

* fix parser

* Revert "fix parser"

This reverts commit 300f446d8a.

* fix parser

* fix

* fix

* fix

* fix

* fix parser

* fix unit test

* fix unit test

* add unit test

* fix

* fix

* add unit test

* fix unit test

* add unit test

* add unit test

* fix unit test

* fix unit test

* fix bug

* fix unit test

* x1 tool parser

* fix unit test

* fix unit test

* fix unit test

* fix n

* fix unit test

* add unit test

* add unit test

* remove pring
2025-12-10 16:17:06 +08:00
Echo-Nie 1b1bfab341 [CI] Add unittest (#5328)
* add test_worker_eplb

* remove tesnsor_wise_fp8

* add copyright
2025-12-09 19:19:42 +08:00
lizexu123 95eab9f9ee [Feature] support stop_token_ids (#5399)
* support stop_token_ids

* fix

* delete chinese

* support both

* delete print
2025-12-09 17:49:12 +08:00
YuBaoku dfeabee123 [CI] Allow occasional distributed worker exit_code (#5341) 2025-12-03 10:56:59 +08:00
YuBaoku 3e2c13d8c5 [CI] Disable queue state assertion temporarily (#5329) 2025-12-02 18:57:29 +08:00
Jiaxin Sui b0113cb0fc [XPU][CI] Change XPU CI Base Value (#5318)
* Add '小度' keyword to assertion in run_w4a8.py

* Add keywords to assertion in run_ep_online.py

* Add keywords to assertion in run_w4a8.py

* Update run_45T.py

* Update run_ep_online.py

* Refactor assertion for response content keywords

* Update run_w4a8.py

* Update run_w4a8.py
2025-12-01 21:02:09 +08:00
Jiaxin Sui b467e9dadc [XPU][CI]Change W4A8 Case Base Value (#5309) 2025-12-01 15:25:33 +08:00
ddchenhao66 fc88eebc32 [CI][XPU] add pd disaggregation (#5179)
* [CI][XPU] add pd disaggregation

* Clarify comments and install iproute2

Updated comments to clarify script purpose and added installation of iproute2.

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-11-28 10:44:27 +08:00
YuBaoku 6a6bf4ea24 [CI] Fix test streaming with stop str (#5275)
* [CI] add output for last_token in test_streaming_with_stop_str

* [CI] Adapt empty last_token check
2025-11-27 20:51:39 +08:00