zccjjj
14a64e9b3b
[XPU] change XPU EP interface from xDeepEP to paddle ( #5706 )
...
* add ENV VAR to controll low lantency buffer
2026-01-21 18:23:45 +08:00
K11OntheBoat
490a6551dc
rename params of normalization layer ( #6133 )
...
Co-authored-by: “liuruian” <liuruian@baidu.com >
2026-01-21 17:18:35 +08:00
lizexu123
1f96028bea
[BugFix] fix python3.12 v0_loader ( #6132 )
2026-01-21 16:12:11 +08:00
yzwu
837ddca273
[Iluvartar][CI] Fix the error max_tokens_per_expert referenced before assignment ( #6083 )
2026-01-21 16:01:29 +08:00
yinwei
85d995100a
Update Dummy Run To Suppport Mutil-Batch Execution ( #6123 )
2026-01-21 14:20:44 +08:00
Cheng Yanfei
9ee0156cc3
add HPU tensorwise_fp8 readme ( #6091 )
2026-01-21 11:48:22 +08:00
MingkunZhang
7e04067663
[Metax][CI] restore 'moe_expert_dispatch' outputs ( #6130 )
2026-01-21 10:33:09 +08:00
YuBaoku
c991fda54c
[CI] Enable 4-GPU e2e test in nightly and fix docker_tag_build ( #6128 )
2026-01-20 22:44:29 +08:00
lizexu123
f4902fe42d
[BugFix] fix wint2 ( #6109 )
...
* fix
* fix
* fix
2026-01-20 21:46:21 +08:00
yinwei
5385d51808
[XPU]XPU FD Release/2.4 Note
2026-01-20 20:38:34 +08:00
Copilot
dcb20c1a2a
[WIP] Add directory guide to mkdocs configuration ( #6121 )
...
* Initial plan
* Add PaddleFormers Backend documentation to mkdocs.yml navigation
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-20 19:51:27 +08:00
luukunn
56e22a7ddc
[Docs]fix doc ( #6119 )
...
* fix doc
* fix doc
2026-01-20 19:46:05 +08:00
yinwei
51a8a2ed57
[XPU] Support CudaGraph(add block attn cuda_graph support) ( #6116 )
...
* add block attn cuda_graph support
2026-01-20 19:33:11 +08:00
jackyYang6
00a6a73431
docs: fix pre-commit error of markdown ( #6100 )
2026-01-20 19:32:05 +08:00
ChowMingSing
bf60e103b6
[CI]Fix test case ( #6111 )
2026-01-20 17:47:44 +08:00
Ryan
dda27e50f5
[Graph Optimization] remove static_op_get_block_shape_and_split_kv_block from cudagraph ( #6081 )
...
* rm static_op_get_block_shape_and_split_kv_block from cudagraph
* update max_capture_shape
* fallback: zeros -> empty to avoid coverage check
* check graph_opt_config exists
* add max_capture_shape_dy2st && full_cuda_graph: false -> true in 28B vl test
* add use_cudagraph flag to control step_use_cudagraph
2026-01-20 14:05:18 +08:00
zhupengyang
45ebb2efb4
[XPU] support plugin model ( #6092 )
2026-01-20 13:00:09 +08:00
jackyYang6
988e0bc338
[Feature] Add PaddleFormers fallback backend ( #5999 )
...
* feat(paddleformers): add dense text model fallback backend
* docs(paddleformers): add user guide and fix code review issues
* add fallback unit test
* precommit format
* fix pre-commit
* fix: address code review feedback
* docs: add PaddleFormers backend documentation (EN) and simplify installation
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-01-19 21:50:50 +08:00
GoldPancake
879e45f6b3
fix compute logits problem ( #6093 )
2026-01-19 20:12:14 +08:00
xiegegege
e22c4e29bb
[CE]add paddleocr config yaml ( #6097 )
2026-01-19 20:07:42 +08:00
Jingfeng Wu
7d44009f39
[FDConfig] transfer metrics_port ( #6056 )
...
* transfer metrics_port
* transfer metrics_port
2026-01-19 19:58:57 +08:00
cmcamdy
211dd81ca7
add pd+mtp ci ( #6090 )
2026-01-19 19:21:24 +08:00
Jiaxin Sui
e0d15a2ded
[XPU][CI] Xpu ci update ( #6089 )
...
* add xpu ci case
* add xpu ci case
* add xpu ci case
* Change runner from XPU-P800-8Card to XPU-P800
* Remove cache queue port from test_pd_03b_tp1.py
Removed cache queue port arguments from test cases.
* Remove cache queue port from test_pd_21b_tp2.py
Removed cache queue port arguments from test cases.
* Update README with PYTHONPATH setup instructions
Added instructions for setting PYTHONPATH in CI scripts.
2026-01-19 16:09:09 +08:00
ChowMingSing
496cc23089
[CI]Fix test cases failing under Python 3.12 ( #6059 )
...
* 修复python3.12下测试用例错误
* 修复python3.12下测试用例错误
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-01-19 15:41:12 +08:00
sunxin
a4144e0b8e
[Optimization] Avoid unnecessary penalty computation ( #6078 )
2026-01-19 15:24:12 +08:00
GoldPancake
05fbd89a8e
[Speculative Decoding][Bugfix] Fix MTP logprob issues caused by max_num_logprobs ( #6084 )
2026-01-19 14:55:36 +08:00
ddchenhao66
3685474799
[XPU] xpu support mm prefill batch ( #6072 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-19 14:36:35 +08:00
sunxin
9dc1c74d36
fix opt qknorm ( #6080 )
2026-01-19 12:07:20 +08:00
YuBaoku
ac6fa6d725
[CI] Add 4-GPU e2e test job ( #6082 )
2026-01-19 10:42:14 +08:00
kevin
0e0eaa1c57
[BugFix] fix mm revert bug ( #6061 )
...
* fix mm revert bug
* update code
2026-01-16 08:13:34 -08:00
Jiaxin Sui
70a962df53
[XPU][CI] XPU CI refactor ( #6053 )
...
* add xpu ci case
* add xpu ci case
* add xpu ci case
* Change runner from XPU-P800-8Card to XPU-P800
2026-01-16 20:57:58 +08:00
GoldPancake
b917b56aca
[Bugfix] Fix logprob issues caused by max_num_logprobs ( #6067 )
2026-01-16 04:40:18 -08:00
周周周
97f96e34ca
only update self.exist_prefill_task_signal in v0 ( #6064 )
...
* commit
* commit
* commit
---------
Co-authored-by: xiaoluomi <1037819816@qq.com >
2026-01-16 20:11:55 +08:00
MingkunZhang
0d372e4fb2
[Metax][CI] update jenkins github action version ( #6065 )
2026-01-16 15:06:14 +08:00
GoldPancake
bda38aa519
[Speculative Decoding] Support MTP for GLM-4.5-Air ( #6047 )
...
* glm mtp
* add spec neox partial rope
2026-01-16 14:35:24 +08:00
qwes5s5
b2a2e11551
[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. ( #5320 )
...
* request disconnect
* request disconnect
* fix bug
* fix bug--amend
---------
Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com >
2026-01-16 11:46:13 +08:00
周周周
8f035101ad
initial commit ( #6054 )
...
Co-authored-by: xiaoluomi <1037819816@qq.com >
2026-01-16 10:49:38 +08:00
fxyfxy777
4c92035f2d
[Feature] Unify fp8 block_wise quant ops ( #5991 )
...
* quant stash
* blockwise_quant
* precommit
* rm tensor.cut
* tp ok
* add swiglu
* rm outdate code
* fix activate ut
* change baseline
* fix baseline error
2026-01-15 05:50:37 -08:00
周周周
d38cd8b40b
[UNITEST] add EP TP test_fused_moe CI ( #5989 )
2026-01-15 21:37:32 +08:00
guozhuangzhuang
d2f1ec2b1b
[XPU] fix(xpu_model_runner): reset seq_lens_encoder to 0 for decode role in PD splitwise mode ( #6048 )
...
* fix(xpu_model_runner): reset seq_lens_encoder to 0 for decode role in PD splitwise mode
- Set seq_lens_encoder to 0 when splitwise_role is 'decode' during prefill processing
- This ensures proper continuation of decoding after P generate first token in PD disaggregated architecture
- Fixes potential sequence length inconsistency in PD splitwise deployment scenarios
* format
2026-01-15 20:24:56 +08:00
freeliuzc
49617d9832
[Feature]Support tag phase token enforce generation ( #6034 )
...
* support tag phase token enforce generation
* optimize note and some feature
* fix sampler unit test
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-01-15 03:59:55 -08:00
freeliuzc
17866c028e
add more cases for attention unit test ( #5931 )
2026-01-15 19:52:35 +08:00
cmcamdy
59d8ae0a25
[XPU] Speculate Decoding + PD, benchmark fix ( #6036 )
...
* fix mtp pd
* fix kernel
* fix code style
* fix kernel
* fix test / clear debug code
* fix test / clear debug code
* fix codestyle
* fix codestyle
* fix codestyle
2026-01-15 19:19:03 +08:00
lizexu123
6619298b50
【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models ( #6007 )
...
* update w4afp8
* build.sh ok
* support cuda_graph
* fix
* add test
* fix max_tokens_per_expert
* >=70
* fix
* compute_max_tokens_from_prefix_sum in w4afp8
* compute_max_tokens use cub
2026-01-15 19:18:42 +08:00
Jiaxin Sui
b0fc9cadb5
[XPU][CI] update paddle version ( #6044 )
...
* Remove cache queue port from test configuration
Removed cache queue port configuration from test.
* Remove cache queue port from test_vl_model.py
Removed cache queue port argument from test configuration.
* Update test_w4a8.py
* Remove cache queue port from test_mtp.py
Removed cache queue port configuration from test.
* Remove cache queue port from test_logprobs_21b_tp4
Removed cache queue port configuration from test.
* Remove cache queue port from test configuration
Removed cache queue port configuration from test.
* Update test_ep4tp4_online.py
* Update run_xpu_ci_pytest.sh to comment out installations
Comment out PaddlePaddle installation and XVLLM download steps.
2026-01-15 15:17:48 +08:00
Daci
e10b51b8c6
[Feature] get_output_kv_signal blocking read mode & send_first_token ( #5836 )
...
* get_output_kv_signal blocking read mode
* send first token before recycle
* xpu get_output_kv_signal blocking read mode
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-15 14:11:03 +08:00
Cheng Yanfei
fbcccaa750
[Intel HPU] enable MoE EP for hpu ( #5855 )
...
* enable HPU MoE EP
* MoE intermediate_scale stack
* enable loader_v1 esp for tensor_wise_fp8 TP or EP
* modify activation_scale name
2026-01-15 13:08:00 +08:00
ming1753
7c56041272
[BugFix] fix PaddleOCR-VL illegal memory ( #6042 )
2026-01-14 20:07:43 -08:00
zhupengyang
24ffa7c991
[XPU] fix moe num_expert ( #6014 )
2026-01-15 10:49:36 +08:00
RAM
b3f59fd9b5
[RL][CI] Support Async R3 And Add Accuracy Test ( #5937 )
...
* add bs1 r3 test case
* async put
* r3 test case 1.0
* success run eb5
* refine test case
* pre-commit
* add eb45 & glm testcase
* format code
* add p2pstore requirements
* support only last turn
* R3 use worker log
* refine code &fix ci bug
* refine error mesg
* fix empty input bug
* Success set acc ci of eb45 and glm45
* refine code
* fix bug
2026-01-14 04:25:06 -08:00