chenjian
6727df8286
[Optimization] Optimize ttft for prefill pd ( #6680 )
...
* optimize ttft
* fix
* fix
* fix ci
* fix ci
* fix
* fix bug
* fix
* add comments
* fix ci
* fix
* fix ci
* fix format
* update according to review
* add comment
* fix
* fix format
2026-03-30 20:36:23 +08:00
yzwu
8789329457
[Iluvatar] Support wi4a16 group_gemm ( #7078 )
2026-03-30 19:03:51 +08:00
YuBaoku
aee293be0f
[CI] Optimize: add vl swap_test and remove useless code ( #7000 )
2026-03-25 11:33:56 +08:00
fxyfxy777
250ce40b40
[Feature] use phi permute/unpermute & rm swiglu ( #6361 )
...
* tp文字输出正常
* B eb5 mini文字输出正常
* eb5mini ep B卡 文字输出正常
* default use phi moe op
* stash
* tp H卡正常
* ep ok
* rm debug
* rm debug tool
* rm del ffn_out
* rm swiglu
* add envs to swiglu
* merge dev
* fix ci baseline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
* fix ci baseline 2
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-03-12 02:01:57 -07:00
bukejiyu
cffa8c246c
[Others]update paddleformer 1.0.0 ( #6496 )
...
* update paddleformer 1.0.0
* update
2026-03-11 15:06:29 +08:00
yzwu
67388ce2f3
[Iluvatar][CI] Replace ci in ernie-300B-4layer with ernie-21b. ( #6747 )
2026-03-10 17:25:52 +08:00
YuBaoku
cbfdf42628
[CI] Add test_dynamic_c8_cache.py and latest FastDeploy.tar.gz upload ( #6708 )
2026-03-08 16:01:12 +08:00
yzwu
81acdb62bd
[Iluvatar][CI] Do not specify FD_LOG_DIR ( #6665 )
2026-03-06 11:54:44 +08:00
yzwu
6674131b0b
[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding ( #6553 )
2026-03-02 14:07:17 +08:00
YuBaoku
bb51829bd5
[CI] Fix tests and docs to resolve failure ( #6572 )
2026-03-01 12:33:01 +08:00
YuBaoku
fa8a2e32c8
[CI] Add test for prefix caching L2 swap ( #6507 )
2026-02-25 19:56:01 +08:00
chenjian
35c24f3f71
Revert "[Optimize] Optimize ttft for ep ( #6098 )" ( #6402 )
...
This reverts commit 90db0bdd0d .
2026-02-09 19:01:23 +08:00
xjkmfa
74762b0fb2
[ci case]Prompt logprobs precision ( #6381 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2026-02-09 11:42:36 +08:00
GoldPancake
183b8d325a
[RL] Support GLM MTP RL Model ( #6267 )
2026-02-04 20:14:35 +08:00
chenjian
90db0bdd0d
[Optimize] Optimize ttft for ep ( #6098 )
...
* optimize ttft
* fix
* fix
* fix ci
* fix ci
* fix
* fix bug
* fix
* add comments
* fix ci
* fix
2026-02-04 15:03:29 +08:00
ddchenhao66
faade7d0ab
[BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled ( #6309 )
2026-02-03 19:49:01 +08:00
GoldPancake
fb374238e1
Revert "[RL] Support GLM MTP RL Model ( #6223 )" ( #6301 )
...
This reverts commit af6c84d48d .
2026-02-02 14:08:13 +08:00
chenjian
292bab7e6d
[BugFix] Fix bug for enable output caching ( #6226 )
...
* [BugFix] Fix bug for enable output caching
* fix
* Fix
* fix
* fix ci
2026-01-30 10:55:36 +08:00
GoldPancake
af6c84d48d
[RL] Support GLM MTP RL Model ( #6223 )
...
* support glm mtp rl model
* fix
* fix
* fix ut
* update baseline
2026-01-28 08:28:03 -08:00
lizexu123
1f96028bea
[BugFix] fix python3.12 v0_loader ( #6132 )
2026-01-21 16:12:11 +08:00
qwes5s5
b2a2e11551
[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. ( #5320 )
...
* request disconnect
* request disconnect
* fix bug
* fix bug--amend
---------
Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com >
2026-01-16 11:46:13 +08:00
fxyfxy777
4c92035f2d
[Feature] Unify fp8 block_wise quant ops ( #5991 )
...
* quant stash
* blockwise_quant
* precommit
* rm tensor.cut
* tp ok
* add swiglu
* rm outdate code
* fix activate ut
* change baseline
* fix baseline error
2026-01-15 05:50:37 -08:00
lizexu123
6619298b50
【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models ( #6007 )
...
* update w4afp8
* build.sh ok
* support cuda_graph
* fix
* add test
* fix max_tokens_per_expert
* >=70
* fix
* compute_max_tokens_from_prefix_sum in w4afp8
* compute_max_tokens use cub
2026-01-15 19:18:42 +08:00
YuBaoku
2c17acd767
[CI] Adapt vl_model baseline changes due to Paddle update_2 ( #6033 )
2026-01-14 15:22:26 +08:00
xjkmfa
1aa7e82924
[ci case]Check the chunking of the chat interface ( #5981 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* [ci case] add Chunk segmentation check
* [ci case] add Chunk segmentation check
* [ci case] add Chunk segmentation check
* [ci case] add Chunk segmentation check
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2026-01-12 16:36:13 +08:00
lizexu123
acdf0cd1d9
fix hadamard_block_size ( #5888 )
2026-01-06 14:12:14 +08:00
xjkmfa
ed60b4da32
[CI case]Prompt logprob ( #5835 )
...
* [ci case]prompt_logprobs
2025-12-30 21:26:06 +08:00
lizexu123
44a13e4557
[Feature] support w4afp8 v1_loader and v0_loader(tp>1) ( #5757 )
...
* support
* fix
* support w4afp8 v1_loader and v0_loader
* fix
* fix test
* fix test
* fix test
* fix moe.py
* add test_ernie_4_5_w4afp8
* add test
* delete tensor
* fix test
* fix
* add
* fix test
2025-12-30 14:11:52 +08:00
yzwu
7b6cc11952
[Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE ( #5735 )
2025-12-26 14:01:27 +08:00
Jiaxin Sui
8fc789bb3f
[iluvatar][CI] refactor iluvatar_ci ( #5588 )
...
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* Update Docker image tag in iluvatar_test workflow
* Update default Docker image version in workflow
* Update iluvatar_test.yml
* Update default Docker image in workflow config
* Update model path in run_ernie300B_4layer.py
* Update model path in offline inference check
* Add model_data directory and copy model files
Create model_data directory and copy necessary files.
* Update run_ernie_vl_28B.py
* Update run_ernie300B_4layer.py
* Update paddlepaddle installation method in script
* Change wget command to include proxy option
* Modify paddle package installation in CI script
Updated installation commands for paddle packages.
* Update paddlepaddle and paddle-iluvatar-gpu versions
* Delete .github/workflows/ci_iluvatar.yml
* Rename workflow from ILUVATAR Test to ILUVATAR-CI
* Update installation commands for paddlepaddle and iluvatar
2025-12-25 15:10:34 +08:00
YuBaoku
e75f93d302
[CI] Refactor RL tests to reuse test_metrics ( #5741 )
2025-12-24 17:08:40 +08:00
YuBaoku
672620cdfe
Revert "[CI] Adapt vl_model baseline changes due to Paddle update ( #5576 )" ( #5732 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
This reverts commit 63fff8df70 .
2025-12-24 11:59:27 +08:00
Divano
c1aa66df02
Revert "[Optim] Remove limitation of number of kvcache blocks ( #5612 )" ( #5702 )
...
This reverts commit 9da89a374b .
2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun
9da89a374b
[Optim] Remove limitation of number of kvcache blocks ( #5612 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optim] Remove limitation of number of kvcache blocks
* Update fastdeploy/envs.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/worker/iluvatar_worker.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Add docs
* Update fastdeploy/worker/worker_process.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix ci case
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-12-23 11:18:29 +08:00
YuBaoku
fe55baae47
[CI] Fix unit_test error of unstable execution ( #5660 )
...
* [CI] Fix unit_test error of unstable execution
2025-12-19 22:59:53 +08:00
MingkunZhang
46d83be065
[Metax] update ci test ( #5652 )
2025-12-19 17:25:47 +08:00
yzwu
ac013803f3
[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode ( #5555 )
2025-12-18 02:14:25 -08:00
Yonghua Li
0c8c6369ed
[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports ( #5415 )
...
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports
* [fix] fix some bugs
* [fix] fix rdma port for cache manager/messager
* [fix] temporarily cancel port availability check to see if it can pass ci test
* [feat] simplify args for multi api server
* [fix] fix dp
* [fix] fix port for xpu
* [fix] add tests for ports post processing & fix ci
* [test] fix test_multi_api_server
* [fix] fix rdma_comm_ports args for multi_api_server
* [fix] fix test_common_engine
* [fix] fix test_cache_transfer_manager
* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER
* [fix] avoid api server from creating engine_args twice
* [fix] fix test_run_batch
* [fix] fix test_metrics
* [fix] fix splitwise connector init
* [test] add test_rdma_transfer and test_expert_service
* [fix] fix code syntax
* [fix] fix test_rdma_transfer and build wheel with rdma script
2025-12-17 15:50:42 +08:00
YuBaoku
5d2b16e6f3
[CI] Remove test_metrics.py due to incompatible forced merge ( #5578 )
...
* [CI] Remove test_metrics.py due to incompatible forced merge
2025-12-16 14:04:46 +08:00
YuBaoku
63fff8df70
[CI] Adapt vl_model baseline changes due to Paddle update ( #5576 )
2025-12-16 11:42:31 +08:00
MingkunZhang
f32e331ef5
[Metax] add ci yaml ( #5520 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-12-12 13:35:38 +08:00
luukunn
fbc9bce1e9
[Feature]Optimization of Thinking Pattern Framework ( #4302 )
...
* add model status in vl
* add x1 parser
* add model_status
* fix parser
* fix parser
* fix parser
* fix parser
* Revert "fix parser"
This reverts commit 300f446d8a .
* fix parser
* fix
* fix
* fix
* fix
* fix parser
* fix unit test
* fix unit test
* add unit test
* fix
* fix
* add unit test
* fix unit test
* add unit test
* add unit test
* fix unit test
* fix unit test
* fix bug
* fix unit test
* x1 tool parser
* fix unit test
* fix unit test
* fix unit test
* fix n
* fix unit test
* add unit test
* add unit test
* remove pring
2025-12-10 16:17:06 +08:00
Echo-Nie
1b1bfab341
[CI] Add unittest ( #5328 )
...
* add test_worker_eplb
* remove tesnsor_wise_fp8
* add copyright
2025-12-09 19:19:42 +08:00
lizexu123
95eab9f9ee
[Feature] support stop_token_ids ( #5399 )
...
* support stop_token_ids
* fix
* delete chinese
* support both
* delete print
2025-12-09 17:49:12 +08:00
YuBaoku
dfeabee123
[CI] Allow occasional distributed worker exit_code ( #5341 )
2025-12-03 10:56:59 +08:00
YuBaoku
3e2c13d8c5
[CI] Disable queue state assertion temporarily ( #5329 )
2025-12-02 18:57:29 +08:00
Jiaxin Sui
b0113cb0fc
[XPU][CI] Change XPU CI Base Value ( #5318 )
...
* Add '小度' keyword to assertion in run_w4a8.py
* Add keywords to assertion in run_ep_online.py
* Add keywords to assertion in run_w4a8.py
* Update run_45T.py
* Update run_ep_online.py
* Refactor assertion for response content keywords
* Update run_w4a8.py
* Update run_w4a8.py
2025-12-01 21:02:09 +08:00
Jiaxin Sui
b467e9dadc
[XPU][CI]Change W4A8 Case Base Value ( #5309 )
2025-12-01 15:25:33 +08:00
ddchenhao66
fc88eebc32
[CI][XPU] add pd disaggregation ( #5179 )
...
* [CI][XPU] add pd disaggregation
* Clarify comments and install iproute2
Updated comments to clarify script purpose and added installation of iproute2.
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-11-28 10:44:27 +08:00
YuBaoku
6a6bf4ea24
[CI] Fix test streaming with stop str ( #5275 )
...
* [CI] add output for last_token in test_streaming_with_stop_str
* [CI] Adapt empty last_token check
2025-11-27 20:51:39 +08:00