Commit Graph

228 Commits

Author SHA1 Message Date
YuBaoku b2aca6c550 [CI] Improve logging check accuracy and unify error log cleanup (#7473) 2026-04-18 19:41:21 +08:00
YuBaoku 91b8bf20f0 [CI] Add pytest failure log collection and persistence (#7405) 2026-04-16 22:56:17 +08:00
YuBaoku 17002edc47 [CI] Add approval check for logging-related modifications (#7429) 2026-04-16 14:50:22 +08:00
Echo-Nie 8819a039c9 [Others] Fix typo (#7280)
* typo

* typo

* typo

* typo
2026-04-14 17:28:22 +08:00
YuBaoku 1269eda2f9 [CI] Ensure container cleanup after job to avoid resource leakage (#7315)
* [CI] Ensure container cleanup after job to avoid resource leakage

* [CI] Use prebuilt wheels to install xgrammar==0.1.19 and torch==2.6.0
2026-04-10 22:32:18 +08:00
YuBaoku ee73623c76 [CI] Set high-risk OOM tests for sequential execution (#7268) 2026-04-09 22:22:57 +08:00
YuBaoku db808f2080 [CI] Optimize log cleanup and isolation in unittest (#7132) 2026-04-01 22:07:55 +08:00
yzwu ceaf5df350 [Iluvatar] Fix cuda graph error for tp > 1 in ernie models (#7126) 2026-04-01 19:13:34 +08:00
YuBaoku c6f0c5c3a6 [CI] Optimize test execution with single-GPU parallelism (#7085)
* [CI] Optimize test execution with single-GPU parallelism and log collection

* remove export CUDA_VISIBLE_DEVICES

* fix path error

* fix log_* path and debug

* [CI] Optimize test execution with single-GPU parallelism and log collection
2026-04-01 14:18:40 +08:00
yzwu 8789329457 [Iluvatar] Support wi4a16 group_gemm (#7078) 2026-03-30 19:03:51 +08:00
YuBaoku 2b84a4276e [CI] Optimize CI: add timeout and cancel on PR close (#6933) 2026-03-19 15:54:30 +08:00
yzwu 8b890c0d72 [Iluvatar] refactor attn and moe code (#6887) 2026-03-18 10:31:00 +08:00
YuBaoku 0359794e08 [CI] Sync _log_softmax_batch_invariant with paddle update (#6893) 2026-03-17 23:03:57 +08:00
yzwu 901b38c936 [Iluvatar] Optimize decode group_gemm and Support cuda graph for ernie (#6803) 2026-03-12 19:21:17 +08:00
yzwu 67388ce2f3 [Iluvatar][CI] Replace ci in ernie-300B-4layer with ernie-21b. (#6747) 2026-03-10 17:25:52 +08:00
yzwu 81acdb62bd [Iluvatar][CI] Do not specify FD_LOG_DIR (#6665) 2026-03-06 11:54:44 +08:00
YuBaoku 16a393e90e [CI] Fix non-deterministic test and skip failed_tests.log in log print (#6672) 2026-03-05 18:47:18 +08:00
YuBaoku 56ceeda80c [CI] Adjust model-specific diff threshold and include iluvatar XPU paths in coverage (#6663) 2026-03-05 10:02:54 +08:00
YuBaoku 5c8f5184d9 [CI] Add pytest timeout and enable workflow rerun (#6645) 2026-03-04 21:30:16 +08:00
yzwu 3345641f4e [Iluvatar][CI] fix the dim error of seq_lens_encoder and seq_lens_decoder (#6637) 2026-03-04 14:00:40 +08:00
YuBaoku 9a48a41abc [CI] Fix accidental deletion of failed_tests.log during log cleanup (#6634) 2026-03-03 22:06:26 +08:00
YuBaoku c3d6d706d5 [CI] Add nightly workflow for golang_router tests and improve log handling (#6608)
* [CI] Add nightly workflow for Golang router tests
* [CI] Improve pytest script stability and log handling
2026-03-03 19:36:57 +08:00
yzwu 6674131b0b [Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553) 2026-03-02 14:07:17 +08:00
Yuqiang Ge 1f931e05cd [CI] Add retry logic for pip install in iluvatar CI script (#6500) 2026-02-25 16:01:41 +08:00
yzwu 60e75ea8e8 [Iluvatar][CI] Fix cannot import get_stop (#6165) 2026-02-10 16:57:23 +08:00
MingkunZhang 6e28b5ef4f [Metax][CI] update metax ci files (#6364) 2026-02-05 17:16:31 +08:00
MingkunZhang 43e3886ef9 [Metax][CI] fix run_ci_metax.sh error (#6341) 2026-02-04 15:43:48 +08:00
MingkunZhang 2ffcb3d9ed [Metax][CI] update ci test files (#6340) 2026-02-04 13:58:07 +08:00
Jiaxin Sui 20074d301f [XPU] [CI] add xpu logprobs case (#6187)
* add xpu case

* add xpu case
2026-01-23 19:40:55 +08:00
YuBaoku 1cfb042045 [CI] Add ep4_mtp e2e test (#6153)
* [CI] Add ep4_mtp e2e test
2026-01-22 14:54:18 +08:00
yzwu 837ddca273 [Iluvartar][CI] Fix the error max_tokens_per_expert referenced before assignment (#6083) 2026-01-21 16:01:29 +08:00
Jiaxin Sui b0fc9cadb5 [XPU][CI] update paddle version (#6044)
* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Remove cache queue port from test_vl_model.py

Removed cache queue port argument from test configuration.

* Update test_w4a8.py

* Remove cache queue port from test_mtp.py

Removed cache queue port configuration from test.

* Remove cache queue port from test_logprobs_21b_tp4

Removed cache queue port configuration from test.

* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Update test_ep4tp4_online.py

* Update run_xpu_ci_pytest.sh to comment out installations

Comment out PaddlePaddle installation and XVLLM download steps.
2026-01-15 15:17:48 +08:00
Jiaxin Sui becd8c3803 [XPU][CI] Update XVLLM_PATH setup in run_xpu_ci_pytest.sh (#6018)
Download and set XVLLM_PATH from output.tar.gz instead of hardcoded path.
2026-01-13 15:42:52 +08:00
Yonghua Li 60ee72f682 [BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935)
* [fix] fix rdma script and add more error log for multi api server

* [fix] log

* [fix] fix test_multi_api_server

* [fix] fix multi api server port check

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-12 10:38:52 +08:00
MingkunZhang 384ffd6952 [Metax] add ci test file & update run_ci_metax.sh (#5975)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2026-01-09 18:47:06 +08:00
Jiaxin Sui e93a7d3b6b Lock PaddlePaddle version in run_xpu_ci_pytest.sh (#5964)
Locked PaddlePaddle version to 20260107 due to compatibility issues with the updated xhpc framework.
2026-01-09 10:41:34 +08:00
mouxin 0a92e96f20 [Feature] Add Golang-based Router for Request Scheduling and Load Balancing (#5882)
* [Feature] add golang router

* [Feature] add golang router

* [Feature] add golang router

* [Feature] add golang router

* [Feature] add golang router

* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing

* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing

* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing

* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-01-07 21:28:08 +08:00
yzwu 29898372e9 [Iluvatar] remove CUDA_VISIBLE_DEVICE in run_ci_iluvatar.sh (#5916) 2026-01-07 14:10:47 +08:00
GoldPancake e78e22ebd5 [BugFix] Fix entropy bugs (#5818)
* fix entropy bugs

* fix ut

* fix
2025-12-29 20:44:29 -08:00
yzwu 7b6cc11952 [Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735) 2025-12-26 14:01:27 +08:00
Jiaxin Sui 8fc789bb3f [iluvatar][CI] refactor iluvatar_ci (#5588)
* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* Update Docker image tag in iluvatar_test workflow

* Update default Docker image version in workflow

* Update iluvatar_test.yml

* Update default Docker image in workflow config

* Update model path in run_ernie300B_4layer.py

* Update model path in offline inference check

* Add model_data directory and copy model files

Create model_data directory and copy necessary files.

* Update run_ernie_vl_28B.py

* Update run_ernie300B_4layer.py

* Update paddlepaddle installation method in script

* Change wget command to include proxy option

* Modify paddle package installation in CI script

Updated installation commands for paddle packages.

* Update paddlepaddle and paddle-iluvatar-gpu versions

* Delete .github/workflows/ci_iluvatar.yml

* Rename workflow from ILUVATAR Test to ILUVATAR-CI

* Update installation commands for paddlepaddle and iluvatar
2025-12-25 15:10:34 +08:00
MingkunZhang e48e306134 [Metax] update ci bash (#5760)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2025-12-25 11:47:38 +08:00
GoldPancake a0fed22ddb [Feature] Add entropy calculation script 2025-12-24 15:00:06 +08:00
Jiaxin Sui 0bef9b684f [Metax][CI]fix CI bug (#5698)
* Update run_ci_metax.sh

* Fix pull request branch reference in CI workflow
2025-12-23 14:56:34 +08:00
MingkunZhang 945a1bc4e2 [Metax] update ci name (#5679)
* [Metax] update ci name

* Update CI_METAX workflow for pull request handling

* Update ci_metax.yml

* Update CI_METAX workflow for pull request handling

* Remove commented-out code in run_ci_metax.sh

* Add environment to Jenkins trigger job

* Change trigger event from pull_request_target to pull_request

* Fix environment name casing in CI workflow

* Change environment name from Metax-ci to Metax_ci

* Modify CI_METAX workflow for PR targeting and concurrency

Updated workflow to use pull_request_target event and added concurrency settings.

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-23 14:00:48 +08:00
YuBaoku b57deb671d [CI] Update check_approval.sh 2025-12-22 15:52:04 +08:00
MingkunZhang 46d83be065 [Metax] update ci test (#5652) 2025-12-19 17:25:47 +08:00
yzwu ac013803f3 [Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555) 2025-12-18 02:14:25 -08:00
zhupengyang 8735cb5045 [XPU] refactor moe ffn (#5501)
- remove BKCL_DISPATCH_ALL_GATHER
- support sparse mode
- support moe quant_method
2025-12-18 14:14:05 +08:00
kesmeey d81341b9b3 [CI]【Hackathon 9th Sprint No.14】功能模块 fastdeploy/rl/rollout_model.py 单测补充 (#5552)
* Add rollout model unit tests

* test: update rl rollout_model tests

* test: fix cache_type_branches unsupported platform case

* test: fix rl rollout_model test indent

* Delete tests/spec_decode/test_mtp_proposer.py

* chore: format test_rollout_model

* chore: translate rollout test comments to English

* test: guard rollout_model import by disabling auto registry

* chore: reorder imports in rl rollout test

* test: isolate env for RL rollout tests

* style: format rollout RL tests with black

* update

* test: remove RL rollout unit tests causing collection issues

* test: add lightweight rollout_model RL unit tests

* fix(coverage): filter test file paths and handle collection failures

- Only extract real test file paths (tests/.../test_*.py) from pytest collect output

- Filter out ERROR/collecting prefixes to prevent garbage in failed_tests.log

- Add proper error handling for pytest collection failures

- Exit early if no test files can be extracted

- Preserve collection error output for debugging

* update

* style: fix code style issues in test_rollout_model.py

- Remove unused 'os' import

- Remove trailing blank lines

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-18 10:57:53 +08:00