chen
a8ffcaa068
fix fa4 test ( #6408 )
2026-02-10 10:57:21 +08:00
kevin
3ce842b55b
[BugFix] add reset shared inputs when update weight dummy run ( #6331 )
...
* fix dummy run input bug
* update code
* update code
* update code
* update code
2026-02-10 10:29:03 +08:00
CSWYF3634076
335ab70b1c
[Feature] console print metrics add env ( #6413 )
2026-02-10 09:37:11 +08:00
YuBaoku
b84056fdaa
[CI] Fix stable_test and add cherry-pick automation ( #6415 )
2026-02-09 23:10:12 +08:00
Lucas
32bd40a192
[XPU] change base XPU docker image ( #6411 )
2026-02-09 22:53:12 +08:00
Jiang-Jia-Jun
4e06df520e
[Feature] 统一请求完成日志格式并增强统计信息 ( #6405 )
...
将原来分散的两行日志合并为一行,同时增加更多统计信息展示。
主要变更:
- 整合原有的 "Request finished" 和 "token ratio" 两行日志为单行格式
- 新增 InputToken:输入token数量
- 新增 CachedDetail:缓存详情(包含CachedToken/GPU/CPU)
- 新增 OutputToken:输出token数量
- 新增 TTFT:首Token时延(秒)
- 新增 E2E:整句时延(秒)
- 保留 IsPrefill 和 RecoveryStop 标志
新日志格式示例:
Request=chatcmpl-xxx, InputToken=18, CachedDetail={"CachedToken": 0, "GPU": 0, "CPU": 0}, OutputToken=247, TokenRatio=315.77, TTFT=0.02, E2E=0.78, IsPrefill=False, RecoveryStop=False
Co-authored-by: Ducc <ducc@baidu.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-09 21:06:55 +08:00
bukejiyu
5bfc0938e2
[BugFix] PD reorder fix and add ut ( #6375 )
2026-02-09 04:42:48 -08:00
CSWYF3634076
ec128068b7
[Others] Exit to ensure no residual processes (cpu cache & dp) ( #6377 )
...
* [Others] good exit single dp
* [Others] good exit cpu cache dp>1
* [Others] good exit cpu cache dp>1 unittest
2026-02-09 20:38:38 +08:00
Mattheliu
d75b1b8df1
[Fix] Use paddle.device.get_device_properties for multi-platform compatibility ( #6400 )
...
Replace paddle.device.cuda.get_device_properties() with paddle.device.get_device_properties()
to support all hardware platforms (NVIDIA, ILUVATAR, HPU, etc.)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-09 19:15:41 +08:00
chenjian
35c24f3f71
Revert "[Optimize] Optimize ttft for ep ( #6098 )" ( #6402 )
...
This reverts commit 90db0bdd0d .
2026-02-09 19:01:23 +08:00
kevin
d60daca4a8
[Feature] consider multimodal model when dummy run ( #6045 )
...
* add mm do profile
* updata code
* update code
* update code
* update code
* update test case
* update code
* update code
* fix xpu bug
* update code
* add mm do profile
* update test case
* update code
2026-02-09 17:49:55 +08:00
sunxin
783d56e28a
[Optimization] Support logprob async copy ( #6362 )
...
* support logprob async copy
* fix prompt logprob
* fix xpu
2026-02-09 17:32:12 +08:00
MingkunZhang
268276e287
[Metax][CI] e2e ci tests enable cuda graph ( #6401 )
2026-02-09 16:25:23 +08:00
luukunn
fd56d85346
add environment_variables ( #6385 )
2026-02-09 15:29:49 +08:00
CSWYF3634076
eb8d639fe3
[Engine] apiserver&engine exit when work failed to start ( #6322 )
2026-02-09 15:07:40 +08:00
bukejiyu
dc5917289d
[loader]supoort wint2 backend ( #6139 )
...
* support wint2
* update
2026-02-08 22:42:36 -08:00
chen
f18f3b99ed
fix zmq hung when sampled_token_id=0 ( #6398 )
2026-02-09 14:13:18 +08:00
chen
29a270bb38
[Docs] Add Doc for Online quantification ( #6399 )
...
* add doc for dynamic quant
* check
2026-02-08 22:09:18 -08:00
0Ayachi0
8bb83b2239
[CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 ( #6210 )
...
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充
* [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-09 14:00:48 +08:00
Mattheliu
c776d483e4
[BugFix]fix handle 4 return values from noaux_tc_redundant op ( #6384 )
...
* fix: handle 4 return values from noaux_tc_redundant op
The noaux_tc_redundant CUDA op is defined with 4 outputs in PD_BUILD_STATIC_OP:
- output_tensor (scores)
- topk_values
- topk_indices
- tokens_per_expert_stats_list_out (inplace updated)
The Python code was only unpacking 3 values, causing:
ValueError: too many values to unpack (expected 3)
This fix correctly unpacks all 4 return values, ignoring the inplace
updated tensor which is the same as the input tokens_per_expert_stats_list.
Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com >
* fix: make noaux_tc_redundant return 4 values to match OP definition
The PD_BUILD_STATIC_OP defines 4 outputs but the function only returned 3,
causing inconsistent behavior across different Paddle framework versions.
This fix explicitly returns 4 values:
- scores (inplace modified)
- topk_values
- topk_indices
- tokens_per_expert_stats_list (inplace modified via atomicAdd)
Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com >
---------
Co-authored-by: Claude (Claude Opus 4.5) <noreply@anthropic.com >
2026-02-09 13:17:47 +08:00
JYChen
9bcd863902
[Others] support import deepgemm/deepep from fleet ops ( #6351 )
...
* update paddleformers to v1.0
* only change import fleetpath
2026-02-09 11:53:13 +08:00
xjkmfa
74762b0fb2
[ci case]Prompt logprobs precision ( #6381 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2026-02-09 11:42:36 +08:00
周周周
2b4748de4f
[MTP] refactor MTP pre_process ( #6358 )
2026-02-09 10:47:15 +08:00
Jiang-Jia-Jun
18e79dd660
[Metrics] Support cpu-cache-block-num ( #6390 )
...
Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com >
2026-02-09 10:27:56 +08:00
MingkunZhang
15e01c6f61
[Metax][CI] add paddleocr ci test ( #6379 )
2026-02-09 10:11:28 +08:00
Yonghua Li
5ac5ecd0b0
[BugFix] fix cache transfer tasks failure after cache cleared ( #6202 )
...
* [fix] fix cache transfer tasks failure after cache cleared
* [fix] fix submit_task
* [fix] fix cache manager hang when clearing prefix cache
* [fix] fix list_proxy has no clear method
* [fix] fix barrier
* [fix] add barrier0
* [fix] add cache_task_is_paused_signal
* [fix] fix condition
* [fix] fix cache transfer sync and delay prefix cache tree clearing
* [fix] fix typo
* [chore] polish code
* [fix] revert only rank0 write kv_cache_status_signal
* [fix] fix thread pool and prefix cache manager hang
* [fix] add timeout for task_swapping_event
* [fix] tolerate prefix cache manager error while prefix tree is cleared
* [chore] add more log
* [fix] fix test_prefix_cache_manager
* [fix] fix prefix_cache_status_signal usage
2026-02-08 15:33:56 +08:00
jc
d6b3c722c1
[KVCache] Storage cache supports c8 model ( #6298 )
...
* Refine cache transfer manager
* Storage cache supports c8 model
2026-02-06 12:01:17 +08:00
chen
72fe94cb13
[Feature] support glm tp+dp+ep ( #6317 )
2026-02-05 21:47:01 +08:00
CSWYF3634076
1c0a2b055f
[Feature] console print statistical metrics ( #6339 )
...
* [Feature] console print statistical data
* [Feature] console print statistical data v2 dp_rank
* [Feature] console print statistical data v2 unittest
* [Feature] console print statistical data v3 unittest
2026-02-05 19:20:36 +08:00
MingkunZhang
de02a909c8
[Metax][CI] restore 21b/28b ci test file ( #6368 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-05 18:38:59 +08:00
YuBaoku
5c9bc13a59
[CI] Fix check-bypass.yml
2026-02-05 18:06:39 +08:00
MingkunZhang
6e28b5ef4f
[Metax][CI] update metax ci files ( #6364 )
2026-02-05 17:16:31 +08:00
周周周
e3fb8796b4
Remove MTP rebuil_padding useless code ( #6336 )
2026-02-05 16:28:44 +08:00
YuBaoku
2d3fb81d29
[CI] Update check-bypass.yml ( #6360 )
2026-02-05 15:52:30 +08:00
K11OntheBoat
116e2aea7a
Support Norm before Rope ( #6332 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2026-02-05 15:28:52 +08:00
chen
29a313a402
[Optimization] Support FA2/FA3/FA4 with attn_mask_q ( #6354 )
...
* support FA4 sm100
* flash attn backend support mask
* flash attn backend run flashmask correct
* add test for flash_attn_backend and flash_attn_func
* check
* add test for fa4
* requirements.txt add fa4 whl
* check test on sm100
* fix CI conflict
* add enable_torch_proxy for flash_mask
* lazy import fa4
* check
* fix tests import
* check test_load_mpt import
2026-02-05 14:39:00 +08:00
lizan1999
72edd394d9
[XPU] support noaux_tc ( #6326 )
2026-02-05 12:04:16 +08:00
YuBaoku
cae2709eff
[CI] Update stable test workflow and run.sh script ( #6352 )
2026-02-05 11:01:15 +08:00
GoldPancake
183b8d325a
[RL] Support GLM MTP RL Model ( #6267 )
2026-02-04 20:14:35 +08:00
luukunn
765df94e6c
[Optimization]update prompt & prompt_token_ids ( #6334 )
...
* fix prompt
* add unit test
* add unit test
* fix
2026-02-04 20:08:01 +08:00
JYChen
bf78a48eb3
[Others] add mock unittest for sm100 FP8 inference ( #6273 )
...
* add unittest
* use new file
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-04 17:39:15 +08:00
sunxin
ef47e6eb46
[Others]skip to_tensor ( #6342 )
2026-02-04 17:25:19 +08:00
Zhang Yulong
26ba019e66
Update README.md ( #6343 )
2026-02-04 15:57:18 +08:00
MingkunZhang
43e3886ef9
[Metax][CI] fix run_ci_metax.sh error ( #6341 )
2026-02-04 15:43:48 +08:00
MingkunZhang
e109fb9a0e
[Metax][Fix] fix issues based #6259 ( #6338 )
2026-02-03 23:21:35 -08:00
chenjian
90db0bdd0d
[Optimize] Optimize ttft for ep ( #6098 )
...
* optimize ttft
* fix
* fix
* fix ci
* fix ci
* fix
* fix bug
* fix
* add comments
* fix ci
* fix
2026-02-04 15:03:29 +08:00
mouxin
6e96bd0bd2
[Feature] Fix counter release logic & update go-router download URL ( #6280 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Fix counter release logic
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update token counter logic and docs
* [Feature] Update token counter logic and docs
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-02-04 15:02:38 +08:00
fxyfxy777
36547cfdb3
[Feature] FD_USE_PHI_FP8_QUANT ( #6320 )
...
* add ut
* add use_fd_quant env
* rm mask_per_token_quant
* add make ops list
* USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true
* modify comments
* use bool type
* Add function declaration
2026-02-03 22:33:03 -08:00
MingkunZhang
2ffcb3d9ed
[Metax][CI] update ci test files ( #6340 )
2026-02-04 13:58:07 +08:00
sunxin
9b0a82cfa9
[Model Runner] Support overlap schedule ( #6259 )
2026-02-04 10:49:44 +08:00