Commit Graph

216 Commits

Author SHA1 Message Date
YuBaoku 0359794e08 [CI] Sync _log_softmax_batch_invariant with paddle update (#6893) 2026-03-17 23:03:57 +08:00
yzwu 901b38c936 [Iluvatar] Optimize decode group_gemm and Support cuda graph for ernie (#6803) 2026-03-12 19:21:17 +08:00
yzwu 67388ce2f3 [Iluvatar][CI] Replace ci in ernie-300B-4layer with ernie-21b. (#6747) 2026-03-10 17:25:52 +08:00
yzwu 81acdb62bd [Iluvatar][CI] Do not specify FD_LOG_DIR (#6665) 2026-03-06 11:54:44 +08:00
YuBaoku 16a393e90e [CI] Fix non-deterministic test and skip failed_tests.log in log print (#6672) 2026-03-05 18:47:18 +08:00
YuBaoku 56ceeda80c [CI] Adjust model-specific diff threshold and include iluvatar XPU paths in coverage (#6663) 2026-03-05 10:02:54 +08:00
YuBaoku 5c8f5184d9 [CI] Add pytest timeout and enable workflow rerun (#6645) 2026-03-04 21:30:16 +08:00
yzwu 3345641f4e [Iluvatar][CI] fix the dim error of seq_lens_encoder and seq_lens_decoder (#6637) 2026-03-04 14:00:40 +08:00
YuBaoku 9a48a41abc [CI] Fix accidental deletion of failed_tests.log during log cleanup (#6634) 2026-03-03 22:06:26 +08:00
YuBaoku c3d6d706d5 [CI] Add nightly workflow for golang_router tests and improve log handling (#6608)
* [CI] Add nightly workflow for Golang router tests
* [CI] Improve pytest script stability and log handling
2026-03-03 19:36:57 +08:00
yzwu 6674131b0b [Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553) 2026-03-02 14:07:17 +08:00
Yuqiang Ge 1f931e05cd [CI] Add retry logic for pip install in iluvatar CI script (#6500) 2026-02-25 16:01:41 +08:00
yzwu 60e75ea8e8 [Iluvatar][CI] Fix cannot import get_stop (#6165) 2026-02-10 16:57:23 +08:00
MingkunZhang 6e28b5ef4f [Metax][CI] update metax ci files (#6364) 2026-02-05 17:16:31 +08:00
MingkunZhang 43e3886ef9 [Metax][CI] fix run_ci_metax.sh error (#6341) 2026-02-04 15:43:48 +08:00
MingkunZhang 2ffcb3d9ed [Metax][CI] update ci test files (#6340) 2026-02-04 13:58:07 +08:00
Jiaxin Sui 20074d301f [XPU] [CI] add xpu logprobs case (#6187)
* add xpu case

* add xpu case
2026-01-23 19:40:55 +08:00
YuBaoku 1cfb042045 [CI] Add ep4_mtp e2e test (#6153)
* [CI] Add ep4_mtp e2e test
2026-01-22 14:54:18 +08:00
yzwu 837ddca273 [Iluvartar][CI] Fix the error max_tokens_per_expert referenced before assignment (#6083) 2026-01-21 16:01:29 +08:00
Jiaxin Sui b0fc9cadb5 [XPU][CI] update paddle version (#6044)
* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Remove cache queue port from test_vl_model.py

Removed cache queue port argument from test configuration.

* Update test_w4a8.py

* Remove cache queue port from test_mtp.py

Removed cache queue port configuration from test.

* Remove cache queue port from test_logprobs_21b_tp4

Removed cache queue port configuration from test.

* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Update test_ep4tp4_online.py

* Update run_xpu_ci_pytest.sh to comment out installations

Comment out PaddlePaddle installation and XVLLM download steps.
2026-01-15 15:17:48 +08:00
Jiaxin Sui becd8c3803 [XPU][CI] Update XVLLM_PATH setup in run_xpu_ci_pytest.sh (#6018)
Download and set XVLLM_PATH from output.tar.gz instead of hardcoded path.
2026-01-13 15:42:52 +08:00
Yonghua Li 60ee72f682 [BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935)
* [fix] fix rdma script and add more error log for multi api server

* [fix] log

* [fix] fix test_multi_api_server

* [fix] fix multi api server port check

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-12 10:38:52 +08:00
MingkunZhang 384ffd6952 [Metax] add ci test file & update run_ci_metax.sh (#5975)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2026-01-09 18:47:06 +08:00
Jiaxin Sui e93a7d3b6b Lock PaddlePaddle version in run_xpu_ci_pytest.sh (#5964)
Locked PaddlePaddle version to 20260107 due to compatibility issues with the updated xhpc framework.
2026-01-09 10:41:34 +08:00
mouxin 0a92e96f20 [Feature] Add Golang-based Router for Request Scheduling and Load Balancing (#5882)
* [Feature] add golang router

* [Feature] add golang router

* [Feature] add golang router

* [Feature] add golang router

* [Feature] add golang router

* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing

* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing

* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing

* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-01-07 21:28:08 +08:00
yzwu 29898372e9 [Iluvatar] remove CUDA_VISIBLE_DEVICE in run_ci_iluvatar.sh (#5916) 2026-01-07 14:10:47 +08:00
GoldPancake e78e22ebd5 [BugFix] Fix entropy bugs (#5818)
* fix entropy bugs

* fix ut

* fix
2025-12-29 20:44:29 -08:00
yzwu 7b6cc11952 [Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735) 2025-12-26 14:01:27 +08:00
Jiaxin Sui 8fc789bb3f [iluvatar][CI] refactor iluvatar_ci (#5588)
* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* Update Docker image tag in iluvatar_test workflow

* Update default Docker image version in workflow

* Update iluvatar_test.yml

* Update default Docker image in workflow config

* Update model path in run_ernie300B_4layer.py

* Update model path in offline inference check

* Add model_data directory and copy model files

Create model_data directory and copy necessary files.

* Update run_ernie_vl_28B.py

* Update run_ernie300B_4layer.py

* Update paddlepaddle installation method in script

* Change wget command to include proxy option

* Modify paddle package installation in CI script

Updated installation commands for paddle packages.

* Update paddlepaddle and paddle-iluvatar-gpu versions

* Delete .github/workflows/ci_iluvatar.yml

* Rename workflow from ILUVATAR Test to ILUVATAR-CI

* Update installation commands for paddlepaddle and iluvatar
2025-12-25 15:10:34 +08:00
MingkunZhang e48e306134 [Metax] update ci bash (#5760)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2025-12-25 11:47:38 +08:00
GoldPancake a0fed22ddb [Feature] Add entropy calculation script 2025-12-24 15:00:06 +08:00
Jiaxin Sui 0bef9b684f [Metax][CI]fix CI bug (#5698)
* Update run_ci_metax.sh

* Fix pull request branch reference in CI workflow
2025-12-23 14:56:34 +08:00
MingkunZhang 945a1bc4e2 [Metax] update ci name (#5679)
* [Metax] update ci name

* Update CI_METAX workflow for pull request handling

* Update ci_metax.yml

* Update CI_METAX workflow for pull request handling

* Remove commented-out code in run_ci_metax.sh

* Add environment to Jenkins trigger job

* Change trigger event from pull_request_target to pull_request

* Fix environment name casing in CI workflow

* Change environment name from Metax-ci to Metax_ci

* Modify CI_METAX workflow for PR targeting and concurrency

Updated workflow to use pull_request_target event and added concurrency settings.

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-23 14:00:48 +08:00
YuBaoku b57deb671d [CI] Update check_approval.sh 2025-12-22 15:52:04 +08:00
MingkunZhang 46d83be065 [Metax] update ci test (#5652) 2025-12-19 17:25:47 +08:00
yzwu ac013803f3 [Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555) 2025-12-18 02:14:25 -08:00
zhupengyang 8735cb5045 [XPU] refactor moe ffn (#5501)
- remove BKCL_DISPATCH_ALL_GATHER
- support sparse mode
- support moe quant_method
2025-12-18 14:14:05 +08:00
kesmeey d81341b9b3 [CI]【Hackathon 9th Sprint No.14】功能模块 fastdeploy/rl/rollout_model.py 单测补充 (#5552)
* Add rollout model unit tests

* test: update rl rollout_model tests

* test: fix cache_type_branches unsupported platform case

* test: fix rl rollout_model test indent

* Delete tests/spec_decode/test_mtp_proposer.py

* chore: format test_rollout_model

* chore: translate rollout test comments to English

* test: guard rollout_model import by disabling auto registry

* chore: reorder imports in rl rollout test

* test: isolate env for RL rollout tests

* style: format rollout RL tests with black

* update

* test: remove RL rollout unit tests causing collection issues

* test: add lightweight rollout_model RL unit tests

* fix(coverage): filter test file paths and handle collection failures

- Only extract real test file paths (tests/.../test_*.py) from pytest collect output

- Filter out ERROR/collecting prefixes to prevent garbage in failed_tests.log

- Add proper error handling for pytest collection failures

- Exit early if no test files can be extracted

- Preserve collection error output for debugging

* update

* style: fix code style issues in test_rollout_model.py

- Remove unused 'os' import

- Remove trailing blank lines

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-18 10:57:53 +08:00
FocusLuo c3aaa7e441 [BugFix] Fixed build script issue on Intel HPU platforms (#5455)
* [INTEL HPU]  Fixed build script issue for non-gpu platforms

Signed-off-by: Luo, Focus <focus.luo@intel.com>

* [INTEL HPU] PR CI HPU will not use fixed version of fastdeploy_intel_hpu

Signed-off-by: Luo, Focus <focus.luo@intel.com>

---------

Signed-off-by: Luo, Focus <focus.luo@intel.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-11 16:36:37 +08:00
YuanRisheng f7c6b8c4ec modify approve (#5443)
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-12-09 16:52:10 +08:00
Jiaxin Sui b5a7abe624 [XPU] [CI] Change Paddle Version to Nightly (#5346)
* Enhance run_ci_xpu.sh with caching and prefill options

* Update model path and configuration in run_ci_xpu.sh

* Add '北朝' keyword to assertion in run_45vl.py

* Enhance process termination logic in run_ci_xpu.sh

* Set timeout for CI_XPU job to 60 minutes

* Remove extra newline in stop_processes function

* Update paddlepaddle-xpu installation command

Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.

* Update PaddlePaddle installation command
2025-12-05 13:01:29 +08:00
zccjjj e927c65742 [XPU] [Optimization] [EP] EP communication optimization. (#5145)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-05 10:03:45 +08:00
Longzhi Wang f6544c0b1b [CI] Add RD in env CI. (#5345)
* test

* [CI] modify env ci(add RD)

* test done
2025-12-03 13:18:17 +08:00
YuBaoku dfeabee123 [CI] Allow occasional distributed worker exit_code (#5341) 2025-12-03 10:56:59 +08:00
Longzhi Wang 21f138f68b [CI] Add env ci (#5331)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* test

* [CI] Add env ci

* test donw
2025-12-02 19:31:25 +08:00
fmiao2372 429dd2b1db [Intel HPU] add example benchmark scripts for hpu (#5304)
* [Intel HPU] add example benchmark scripts for hpu

* Revise the code based on the copilot comments

* update code based on comments

* update ci ops version
2025-12-02 18:00:01 +08:00
Jiaxin Sui 8e0f4dfd0c [XPU] [CI] Xpu Ci Refactor (#5252)
* add xpu ci

* add case

* add case

* fix ci bug

* Update Docker image tag to 'latest' in CI workflow

* Fix set -e usage in run_xpu_ci_pytest.sh

* add pd case

* add case

* Configure pip to use Tsinghua mirror for dependencies

Set the global pip index URL to Tsinghua mirror.

* fix ci bug

* fix bug

* fix bug

---------

Co-authored-by: suijiaxin <suijiaxin@Suis-MacBook-Pro.local>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511964.gajl.baidu.com>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
2025-12-02 17:15:51 +08:00
Yuanle Liu 54119cf07e [CI] Remove need approve by yuanlehome (#5310) 2025-12-01 01:44:43 -08:00
ddchenhao66 fc88eebc32 [CI][XPU] add pd disaggregation (#5179)
* [CI][XPU] add pd disaggregation

* Clarify comments and install iproute2

Updated comments to clarify script purpose and added installation of iproute2.

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-11-28 10:44:27 +08:00
Jiaxin Sui 07cb11e51d [XPU][CI] Set pip index URL to Tsinghua mirror (#5277)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* Set pip index URL to Tsinghua mirror

* Update ci_xpu.yml

* Update Docker image version in CI workflow

* Update Docker image tag in CI workflow
2025-11-27 22:12:20 +08:00