Commit Graph

32 Commits

Author SHA1 Message Date
Jiaxin Sui fbc3aa93de [XPU][CI] Remove duplicate NICs from environment variables (#7244) 2026-04-08 19:14:15 +08:00
Jiaxin Sui c3ed7db28d [XPU] [CI] Fix xpu ci bug (#7014)
* fix xpu ci bug

* Remove unnecessary blank line in conftest.py

* Update upload-artifact action to version 6

* Update _xpu_8cards_case_test.yml

* fix ci bug

* Change exit code on test failure to 1

* fix ci bug

* fix ci bug

* fix ci bug

* fix ci bug

* Update conftest.py
2026-03-27 10:29:34 +08:00
yinwei 3f4441b4b7 [XPU]add mtp cudagraph support (#6831) 2026-03-13 19:46:53 +08:00
Jiaxin Sui a3d7979711 [XPU][CI]Rename test_ep4tp1_online.py to run_ep4tp1_online.py (#6805) 2026-03-12 16:16:20 +08:00
yinwei 7d31a728d1 Add PD+EP cudagraph Support 2026-03-12 13:20:59 +08:00
zccjjj a2072fe20c [XPU] support warmup with ep & remove apply_tp_fused_op (#6289) 2026-02-28 15:40:36 +08:00
zccjjj c34cb2a8c2 [XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape (#6337) 2026-02-27 09:55:41 +08:00
yinwei 256651e9de Add PD Cudagraph CI Case 2026-02-26 17:01:20 +08:00
ddchenhao66 6d33d5e370 [Models][BugFix] shared experts and dense mlp layer do not require TP split (#6180)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-28 18:58:19 +08:00
Jiaxin Sui f1cee7fd5e [XPU] [CI] XPU CI Updata (#6211)
* Update log file path in test_pd_21b_ep4tp1.py

* Update log file path in test_pd_21b_ep4tp4.py

* Update log file path in test_pd_p_tp4ep4_d_tp1ep4
2026-01-27 11:45:53 +08:00
yinwei 56d01f7e49 [XPU][CI]Add Cuda Graph CI Case (#6229)
* add cuda graph ci case
2026-01-26 23:20:44 +08:00
Jiaxin Sui 20074d301f [XPU] [CI] add xpu logprobs case (#6187)
* add xpu case

* add xpu case
2026-01-23 19:40:55 +08:00
zccjjj 14a64e9b3b [XPU] change XPU EP interface from xDeepEP to paddle (#5706)
* add ENV VAR to controll low lantency buffer
2026-01-21 18:23:45 +08:00
cmcamdy 211dd81ca7 add pd+mtp ci (#6090) 2026-01-19 19:21:24 +08:00
Jiaxin Sui e0d15a2ded [XPU][CI] Xpu ci update (#6089)
* add xpu ci case

* add xpu ci case

* add xpu ci case

* Change runner from XPU-P800-8Card to XPU-P800

* Remove cache queue port from test_pd_03b_tp1.py

Removed cache queue port arguments from test cases.

* Remove cache queue port from test_pd_21b_tp2.py

Removed cache queue port arguments from test cases.

* Update README with PYTHONPATH setup instructions

Added instructions for setting PYTHONPATH in CI scripts.
2026-01-19 16:09:09 +08:00
Jiaxin Sui 70a962df53 [XPU][CI] XPU CI refactor (#6053)
* add xpu ci case

* add xpu ci case

* add xpu ci case

* Change runner from XPU-P800-8Card to XPU-P800
2026-01-16 20:57:58 +08:00
ddchenhao66 9373f373dc [XPU] fix multi-batch bug in VL model (#6015)
* [XPU] fix multi-batch bug in VL model

* Add command to kill additional port processes

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-01-14 19:44:58 +08:00
Jiaxin Sui 926a26074f [XPU][CI] Cache queue port bug fix (#6030)
* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Remove cache queue port from test_vl_model.py

Removed cache queue port argument from test configuration.

* Update test_w4a8.py

* Remove cache queue port from test_mtp.py

Removed cache queue port configuration from test.

* Remove cache queue port from test_logprobs_21b_tp4

Removed cache queue port configuration from test.

* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Update test_ep4tp4_online.py
2026-01-14 12:51:40 +08:00
ddchenhao66 fefc0b8382 [XPU]add ci test cast for P_EP4TP4 D_EP4TP1 (#5988)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-12 16:30:15 +08:00
zhupengyang 9db48ecb34 [XPU] fix dp4 (#5946) 2026-01-09 20:36:53 +08:00
ddchenhao66 733014bf32 [XPU] Support EP4TP1 in pd disaggregation (#5860)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-06 15:25:36 +08:00
Jiaxin Sui 2785b820c8 [XPU][CI] Add XPU logprobs case (#5874)
* Enhance run_ci_xpu.sh with caching and prefill options

* Update model path and configuration in run_ci_xpu.sh

* Add '北朝' keyword to assertion in run_45vl.py

* Enhance process termination logic in run_ci_xpu.sh

* Set timeout for CI_XPU job to 60 minutes

* Remove extra newline in stop_processes function

* Update paddlepaddle-xpu installation command

Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.

* Update PaddlePaddle installation command

* Remove max_tokens from model response configuration

Removed max_tokens parameter from the model response call.

* add xpu logprobs case

* Fix formatting and improve setup_logprobs_env

Add newline at end of file and update setup_logprobs_env function.

* Refactor test_logprobs_21b_tp4.py for clarity

* Change top_p value from 1.0 to 0

---------

Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
2026-01-05 19:01:14 +08:00
ddchenhao66 56a9ecccb2 [XPU] xpu support ep4tp4 (#5773)
* [XPU] xpu support ep4tp4

* Add commands to check multiprocessing and fastdeploy processes

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-29 11:27:01 +08:00
Jiaxin Sui f16077a939 [XPU][CI] Xpu ci update (#5690)
* Enhance run_ci_xpu.sh with caching and prefill options

* Update model path and configuration in run_ci_xpu.sh

* Add '北朝' keyword to assertion in run_45vl.py

* Enhance process termination logic in run_ci_xpu.sh

* Set timeout for CI_XPU job to 60 minutes

* Remove extra newline in stop_processes function

* Update paddlepaddle-xpu installation command

Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.

* Update PaddlePaddle installation command

* Remove max_tokens from model response configuration

Removed max_tokens parameter from the model response call.
2025-12-23 10:19:39 +08:00
ddchenhao66 a1535c7e7e [XPU][CI] xpu add ci test for pd + TP2 (#5653)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-22 19:27:10 +08:00
Jiaxin Sui d739af5e6e Revert "[XPU][CI] xpu add ci test for pd (#5610)" (#5645)
This reverts commit 80fb530ce2.
2025-12-18 19:59:09 +08:00
ddchenhao66 80fb530ce2 [XPU][CI] xpu add ci test for pd (#5610)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-17 16:07:44 +08:00
Jiaxin Sui 92119773c7 [CI][XPU] add mtp case (#5537)
* add mtp case

* Refactor test_mtp.py for clarity and efficiency

Removed duplicate import of json and simplified spec_config formatting.

---------

Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
2025-12-12 19:14:40 +08:00
zccjjj 03819f30c3 [CI][XPU] ep+prefix cache+chunk prefill (#5489) 2025-12-10 19:39:49 +08:00
zccjjj 5b900667e3 [XPU] support ep4tp1+v1 loader (#5398) 2025-12-05 18:51:15 +08:00
zccjjj e927c65742 [XPU] [Optimization] [EP] EP communication optimization. (#5145)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-05 10:03:45 +08:00
Jiaxin Sui 8e0f4dfd0c [XPU] [CI] Xpu Ci Refactor (#5252)
* add xpu ci

* add case

* add case

* fix ci bug

* Update Docker image tag to 'latest' in CI workflow

* Fix set -e usage in run_xpu_ci_pytest.sh

* add pd case

* add case

* Configure pip to use Tsinghua mirror for dependencies

Set the global pip index URL to Tsinghua mirror.

* fix ci bug

* fix bug

* fix bug

---------

Co-authored-by: suijiaxin <suijiaxin@Suis-MacBook-Pro.local>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511964.gajl.baidu.com>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
2025-12-02 17:15:51 +08:00