Commit Graph

4454 Commits

Author SHA1 Message Date
jackyYang6 00a6a73431 docs: fix pre-commit error of markdown (#6100) 2026-01-20 19:32:05 +08:00
ChowMingSing bf60e103b6 [CI]Fix test case (#6111) 2026-01-20 17:47:44 +08:00
Ryan dda27e50f5 [Graph Optimization] remove static_op_get_block_shape_and_split_kv_block from cudagraph (#6081)
* rm static_op_get_block_shape_and_split_kv_block from cudagraph

* update max_capture_shape

* fallback: zeros -> empty to avoid coverage check

* check graph_opt_config exists

* add max_capture_shape_dy2st && full_cuda_graph: false -> true in 28B vl test

* add use_cudagraph flag to control step_use_cudagraph
2026-01-20 14:05:18 +08:00
zhupengyang 45ebb2efb4 [XPU] support plugin model (#6092) 2026-01-20 13:00:09 +08:00
jackyYang6 988e0bc338 [Feature] Add PaddleFormers fallback backend (#5999)
* feat(paddleformers): add dense text model fallback backend

* docs(paddleformers): add user guide and fix code review issues

* add fallback unit test

* precommit format

* fix pre-commit

* fix: address code review feedback

* docs: add PaddleFormers backend documentation (EN) and simplify installation

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-19 21:50:50 +08:00
GoldPancake 879e45f6b3 fix compute logits problem (#6093) 2026-01-19 20:12:14 +08:00
xiegegege e22c4e29bb [CE]add paddleocr config yaml (#6097) 2026-01-19 20:07:42 +08:00
Jingfeng Wu 7d44009f39 [FDConfig] transfer metrics_port (#6056)
* transfer metrics_port

* transfer metrics_port
2026-01-19 19:58:57 +08:00
cmcamdy 211dd81ca7 add pd+mtp ci (#6090) 2026-01-19 19:21:24 +08:00
Jiaxin Sui e0d15a2ded [XPU][CI] Xpu ci update (#6089)
* add xpu ci case

* add xpu ci case

* add xpu ci case

* Change runner from XPU-P800-8Card to XPU-P800

* Remove cache queue port from test_pd_03b_tp1.py

Removed cache queue port arguments from test cases.

* Remove cache queue port from test_pd_21b_tp2.py

Removed cache queue port arguments from test cases.

* Update README with PYTHONPATH setup instructions

Added instructions for setting PYTHONPATH in CI scripts.
2026-01-19 16:09:09 +08:00
ChowMingSing 496cc23089 [CI]Fix test cases failing under Python 3.12 (#6059)
* 修复python3.12下测试用例错误

* 修复python3.12下测试用例错误

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-19 15:41:12 +08:00
sunxin a4144e0b8e [Optimization] Avoid unnecessary penalty computation (#6078) 2026-01-19 15:24:12 +08:00
GoldPancake 05fbd89a8e [Speculative Decoding][Bugfix] Fix MTP logprob issues caused by max_num_logprobs (#6084) 2026-01-19 14:55:36 +08:00
ddchenhao66 3685474799 [XPU] xpu support mm prefill batch (#6072)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-19 14:36:35 +08:00
sunxin 9dc1c74d36 fix opt qknorm (#6080) 2026-01-19 12:07:20 +08:00
YuBaoku ac6fa6d725 [CI] Add 4-GPU e2e test job (#6082) 2026-01-19 10:42:14 +08:00
kevin 0e0eaa1c57 [BugFix] fix mm revert bug (#6061)
* fix mm revert bug

* update code
2026-01-16 08:13:34 -08:00
Jiaxin Sui 70a962df53 [XPU][CI] XPU CI refactor (#6053)
* add xpu ci case

* add xpu ci case

* add xpu ci case

* Change runner from XPU-P800-8Card to XPU-P800
2026-01-16 20:57:58 +08:00
GoldPancake b917b56aca [Bugfix] Fix logprob issues caused by max_num_logprobs (#6067) 2026-01-16 04:40:18 -08:00
周周周 97f96e34ca only update self.exist_prefill_task_signal in v0 (#6064)
* commit

* commit

* commit

---------

Co-authored-by: xiaoluomi <1037819816@qq.com>
2026-01-16 20:11:55 +08:00
MingkunZhang 0d372e4fb2 [Metax][CI] update jenkins github action version (#6065) 2026-01-16 15:06:14 +08:00
GoldPancake bda38aa519 [Speculative Decoding] Support MTP for GLM-4.5-Air (#6047)
* glm mtp
* add spec neox partial rope
2026-01-16 14:35:24 +08:00
qwes5s5 b2a2e11551 [Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320)
* request disconnect

* request disconnect

* fix bug

* fix bug--amend

---------

Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>
2026-01-16 11:46:13 +08:00
周周周 8f035101ad initial commit (#6054)
Co-authored-by: xiaoluomi <1037819816@qq.com>
2026-01-16 10:49:38 +08:00
fxyfxy777 4c92035f2d [Feature] Unify fp8 block_wise quant ops (#5991)
* quant stash

* blockwise_quant

* precommit

* rm tensor.cut

* tp ok

* add swiglu

* rm outdate code

* fix activate ut

* change baseline

* fix baseline error
2026-01-15 05:50:37 -08:00
周周周 d38cd8b40b [UNITEST] add EP TP test_fused_moe CI (#5989) 2026-01-15 21:37:32 +08:00
guozhuangzhuang d2f1ec2b1b [XPU] fix(xpu_model_runner): reset seq_lens_encoder to 0 for decode role in PD splitwise mode (#6048)
* fix(xpu_model_runner): reset seq_lens_encoder to 0 for decode role in PD splitwise mode

- Set seq_lens_encoder to 0 when splitwise_role is 'decode' during prefill processing
- This ensures proper continuation of decoding after P generate first token in PD disaggregated architecture
- Fixes potential sequence length inconsistency in PD splitwise deployment scenarios

* format
2026-01-15 20:24:56 +08:00
freeliuzc 49617d9832 [Feature]Support tag phase token enforce generation (#6034)
* support tag phase token enforce generation

* optimize note and some feature

* fix sampler unit test

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-15 03:59:55 -08:00
freeliuzc 17866c028e add more cases for attention unit test (#5931) 2026-01-15 19:52:35 +08:00
cmcamdy 59d8ae0a25 [XPU] Speculate Decoding + PD, benchmark fix (#6036)
* fix mtp pd

* fix kernel

* fix code style

* fix kernel

* fix test / clear debug code

* fix test / clear debug code

* fix codestyle

* fix codestyle

* fix codestyle
2026-01-15 19:19:03 +08:00
lizexu123 6619298b50 【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007)
* update w4afp8

* build.sh ok

* support cuda_graph

* fix

* add test

* fix max_tokens_per_expert

* >=70

* fix

* compute_max_tokens_from_prefix_sum in w4afp8

* compute_max_tokens use cub
2026-01-15 19:18:42 +08:00
Jiaxin Sui b0fc9cadb5 [XPU][CI] update paddle version (#6044)
* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Remove cache queue port from test_vl_model.py

Removed cache queue port argument from test configuration.

* Update test_w4a8.py

* Remove cache queue port from test_mtp.py

Removed cache queue port configuration from test.

* Remove cache queue port from test_logprobs_21b_tp4

Removed cache queue port configuration from test.

* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Update test_ep4tp4_online.py

* Update run_xpu_ci_pytest.sh to comment out installations

Comment out PaddlePaddle installation and XVLLM download steps.
2026-01-15 15:17:48 +08:00
Daci e10b51b8c6 [Feature] get_output_kv_signal blocking read mode & send_first_token (#5836)
* get_output_kv_signal blocking read mode

* send first token before recycle

* xpu get_output_kv_signal blocking read mode

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-15 14:11:03 +08:00
Cheng Yanfei fbcccaa750 [Intel HPU] enable MoE EP for hpu (#5855)
* enable HPU MoE EP

* MoE intermediate_scale stack

* enable loader_v1 esp for tensor_wise_fp8 TP or EP

* modify activation_scale name
2026-01-15 13:08:00 +08:00
ming1753 7c56041272 [BugFix] fix PaddleOCR-VL illegal memory (#6042) 2026-01-14 20:07:43 -08:00
zhupengyang 24ffa7c991 [XPU] fix moe num_expert (#6014) 2026-01-15 10:49:36 +08:00
RAM b3f59fd9b5 [RL][CI] Support Async R3 And Add Accuracy Test (#5937)
* add bs1 r3 test case

* async put

* r3 test case 1.0

* success run eb5

* refine test case

* pre-commit

* add eb45 & glm testcase

* format code

* add p2pstore requirements

* support only last turn

* R3 use worker log

* refine code &fix ci bug

* refine error mesg

* fix empty input bug

* Success set acc ci of eb45 and glm45

* refine code

* fix bug
2026-01-14 04:25:06 -08:00
ddchenhao66 9373f373dc [XPU] fix multi-batch bug in VL model (#6015)
* [XPU] fix multi-batch bug in VL model

* Add command to kill additional port processes

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-01-14 19:44:58 +08:00
xiaoxiaohehe001 6f72be7c3e [Optimize] Qwen2.5-VL vision model with merged linear layers and unif… (#6037)
* [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization

* [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization
2026-01-14 19:21:31 +08:00
luukunn 93b7675a64 [Feature]Report FD statistical information (#5646)
* add usage commit

* update envs and xpu

* add requirements

* fix quantization value

* add unit test

* add unit test

* fix unit test

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test

* fix FD_USAGE_STATS_SERVER

* fix

* fix

* add doc

* add doc

* add doc

* add doc

* add doc

* fix file name
2026-01-14 17:54:01 +08:00
MingkunZhang 273e79aa5b [Metax][Fix] fix self.share_inputs['preempted_idx']=[] incorrect use (#6038)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2026-01-14 17:06:00 +08:00
MingkunZhang 32fb04703b [Metax][Doc] update metax gpu 'get_started' doc (#6035)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2026-01-14 16:11:43 +08:00
YuBaoku 2c17acd767 [CI] Adapt vl_model baseline changes due to Paddle update_2 (#6033) 2026-01-14 15:22:26 +08:00
MingkunZhang f3587b592c [Metax][CI] remove 28B VL model test sampling randomness (#6032)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2026-01-14 14:00:41 +08:00
Jiaxin Sui 926a26074f [XPU][CI] Cache queue port bug fix (#6030)
* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Remove cache queue port from test_vl_model.py

Removed cache queue port argument from test configuration.

* Update test_w4a8.py

* Remove cache queue port from test_mtp.py

Removed cache queue port configuration from test.

* Remove cache queue port from test_logprobs_21b_tp4

Removed cache queue port configuration from test.

* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Update test_ep4tp4_online.py
2026-01-14 12:51:40 +08:00
chenjian 74d0f1c01f [Optim] Robust sync status when preempted happens (#5796)
* [Bug fix] Sync status for caching output cache

* fix

* fix

* fix bug

* fix

* fix

* support xpu

* fix

* fix

* fix

* fix

* fix

* fix ci

* fix ci

* fix xpu

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-14 12:07:33 +08:00
Ryan 0d1a5e70bc [Graph Optimization] Add full_cuda_graph to control subgraph split (#6027) 2026-01-14 11:43:59 +08:00
Yonghua Li 456637002d [BugFix] fix cache transfer manager updating/clearing (#5930)
* [fix] fix cache transfer manager updating/clearing

* [fix] fix code style

* [fix] fix config

* [fix] fix engine client

* [fix] let worker update kv cache status signal

* [fix] update worker process

* [fix] fix clear/update for case if comm group is shutdown

* [fix] update dynamic weight manager

* [fix] fix port

* [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting
2026-01-13 05:09:29 -08:00
chenjian 6da06abc17 [Featue] Enable output caching by default (#5987)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-13 19:34:21 +08:00
MingkunZhang 3772810b0a [Metax][CI] update test_ernie_28b_vl.py image result keywords (#6022)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2026-01-13 17:15:10 +08:00