Commit Graph

4608 Commits

Author SHA1 Message Date
chen a8ffcaa068 fix fa4 test (#6408) 2026-02-10 10:57:21 +08:00
kevin 3ce842b55b [BugFix] add reset shared inputs when update weight dummy run (#6331)
* fix dummy run input bug

* update code

* update code

* update code

* update code
2026-02-10 10:29:03 +08:00
CSWYF3634076 335ab70b1c [Feature] console print metrics add env (#6413) 2026-02-10 09:37:11 +08:00
YuBaoku b84056fdaa [CI] Fix stable_test and add cherry-pick automation (#6415) 2026-02-09 23:10:12 +08:00
Lucas 32bd40a192 [XPU] change base XPU docker image (#6411) 2026-02-09 22:53:12 +08:00
Jiang-Jia-Jun 4e06df520e [Feature] 统一请求完成日志格式并增强统计信息 (#6405)
将原来分散的两行日志合并为一行,同时增加更多统计信息展示。

主要变更:
- 整合原有的 "Request finished" 和 "token ratio" 两行日志为单行格式
- 新增 InputToken:输入token数量
- 新增 CachedDetail:缓存详情(包含CachedToken/GPU/CPU)
- 新增 OutputToken:输出token数量
- 新增 TTFT:首Token时延(秒)
- 新增 E2E:整句时延(秒)
- 保留 IsPrefill 和 RecoveryStop 标志

新日志格式示例:
Request=chatcmpl-xxx, InputToken=18, CachedDetail={"CachedToken": 0, "GPU": 0, "CPU": 0}, OutputToken=247, TokenRatio=315.77, TTFT=0.02, E2E=0.78, IsPrefill=False, RecoveryStop=False

Co-authored-by: Ducc <ducc@baidu.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 21:06:55 +08:00
bukejiyu 5bfc0938e2 [BugFix] PD reorder fix and add ut (#6375) 2026-02-09 04:42:48 -08:00
CSWYF3634076 ec128068b7 [Others] Exit to ensure no residual processes (cpu cache & dp) (#6377)
* [Others] good exit single dp

* [Others] good exit cpu cache dp>1

* [Others] good exit cpu cache dp>1 unittest
2026-02-09 20:38:38 +08:00
Mattheliu d75b1b8df1 [Fix] Use paddle.device.get_device_properties for multi-platform compatibility (#6400)
Replace paddle.device.cuda.get_device_properties() with paddle.device.get_device_properties()
to support all hardware platforms (NVIDIA, ILUVATAR, HPU, etc.)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 19:15:41 +08:00
chenjian 35c24f3f71 Revert "[Optimize] Optimize ttft for ep (#6098)" (#6402)
This reverts commit 90db0bdd0d.
2026-02-09 19:01:23 +08:00
kevin d60daca4a8 [Feature] consider multimodal model when dummy run (#6045)
* add mm do profile

* updata code

* update code

* update code

* update code

* update test case

* update code

* update code

* fix xpu bug

* update code

* add mm do profile

* update test case

* update code
2026-02-09 17:49:55 +08:00
sunxin 783d56e28a [Optimization] Support logprob async copy (#6362)
* support logprob async copy

* fix prompt logprob

* fix xpu
2026-02-09 17:32:12 +08:00
MingkunZhang 268276e287 [Metax][CI] e2e ci tests enable cuda graph (#6401) 2026-02-09 16:25:23 +08:00
luukunn fd56d85346 add environment_variables (#6385) 2026-02-09 15:29:49 +08:00
CSWYF3634076 eb8d639fe3 [Engine] apiserver&engine exit when work failed to start (#6322) 2026-02-09 15:07:40 +08:00
bukejiyu dc5917289d [loader]supoort wint2 backend (#6139)
* support wint2

* update
2026-02-08 22:42:36 -08:00
chen f18f3b99ed fix zmq hung when sampled_token_id=0 (#6398) 2026-02-09 14:13:18 +08:00
chen 29a270bb38 [Docs] Add Doc for Online quantification (#6399)
* add doc for dynamic quant

* check
2026-02-08 22:09:18 -08:00
0Ayachi0 8bb83b2239 [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 (#6210)
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充

* [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-09 14:00:48 +08:00
Mattheliu c776d483e4 [BugFix]fix handle 4 return values from noaux_tc_redundant op (#6384)
* fix: handle 4 return values from noaux_tc_redundant op

The noaux_tc_redundant CUDA op is defined with 4 outputs in PD_BUILD_STATIC_OP:
- output_tensor (scores)
- topk_values
- topk_indices
- tokens_per_expert_stats_list_out (inplace updated)

The Python code was only unpacking 3 values, causing:
  ValueError: too many values to unpack (expected 3)

This fix correctly unpacks all 4 return values, ignoring the inplace
updated tensor which is the same as the input tokens_per_expert_stats_list.

Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com>

* fix: make noaux_tc_redundant return 4 values to match OP definition

The PD_BUILD_STATIC_OP defines 4 outputs but the function only returned 3,
causing inconsistent behavior across different Paddle framework versions.

This fix explicitly returns 4 values:
- scores (inplace modified)
- topk_values
- topk_indices
- tokens_per_expert_stats_list (inplace modified via atomicAdd)

Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com>

---------

Co-authored-by: Claude (Claude Opus 4.5) <noreply@anthropic.com>
2026-02-09 13:17:47 +08:00
JYChen 9bcd863902 [Others] support import deepgemm/deepep from fleet ops (#6351)
* update paddleformers to v1.0

* only change import fleetpath
2026-02-09 11:53:13 +08:00
xjkmfa 74762b0fb2 [ci case]Prompt logprobs precision (#6381)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2026-02-09 11:42:36 +08:00
周周周 2b4748de4f [MTP] refactor MTP pre_process (#6358) 2026-02-09 10:47:15 +08:00
Jiang-Jia-Jun 18e79dd660 [Metrics] Support cpu-cache-block-num (#6390)
Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com>
2026-02-09 10:27:56 +08:00
MingkunZhang 15e01c6f61 [Metax][CI] add paddleocr ci test (#6379) 2026-02-09 10:11:28 +08:00
Yonghua Li 5ac5ecd0b0 [BugFix] fix cache transfer tasks failure after cache cleared (#6202)
* [fix] fix cache transfer tasks failure after cache cleared

* [fix] fix submit_task

* [fix] fix cache manager hang when clearing prefix cache

* [fix] fix list_proxy has no clear method

* [fix] fix barrier

* [fix] add barrier0

* [fix] add cache_task_is_paused_signal

* [fix] fix condition

* [fix] fix cache transfer  sync and delay prefix cache tree clearing

* [fix] fix typo

* [chore] polish code

* [fix] revert only rank0 write kv_cache_status_signal

* [fix] fix thread pool and prefix cache manager hang

* [fix] add timeout for task_swapping_event

* [fix] tolerate prefix cache manager error while prefix tree is cleared

* [chore] add more log

* [fix] fix test_prefix_cache_manager

* [fix] fix prefix_cache_status_signal usage
2026-02-08 15:33:56 +08:00
jc d6b3c722c1 [KVCache] Storage cache supports c8 model (#6298)
* Refine cache transfer manager
* Storage cache supports c8 model
2026-02-06 12:01:17 +08:00
chen 72fe94cb13 [Feature] support glm tp+dp+ep (#6317) 2026-02-05 21:47:01 +08:00
CSWYF3634076 1c0a2b055f [Feature] console print statistical metrics (#6339)
* [Feature] console print statistical data

* [Feature] console print statistical data v2 dp_rank

* [Feature] console print statistical data v2 unittest

* [Feature] console print statistical data v3 unittest
2026-02-05 19:20:36 +08:00
MingkunZhang de02a909c8 [Metax][CI] restore 21b/28b ci test file (#6368)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-05 18:38:59 +08:00
YuBaoku 5c9bc13a59 [CI] Fix check-bypass.yml 2026-02-05 18:06:39 +08:00
MingkunZhang 6e28b5ef4f [Metax][CI] update metax ci files (#6364) 2026-02-05 17:16:31 +08:00
周周周 e3fb8796b4 Remove MTP rebuil_padding useless code (#6336) 2026-02-05 16:28:44 +08:00
YuBaoku 2d3fb81d29 [CI] Update check-bypass.yml (#6360) 2026-02-05 15:52:30 +08:00
K11OntheBoat 116e2aea7a Support Norm before Rope (#6332)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2026-02-05 15:28:52 +08:00
chen 29a313a402 [Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354)
* support FA4 sm100

* flash attn backend support mask

* flash attn backend run flashmask correct

* add test for flash_attn_backend and flash_attn_func

* check

* add test for fa4

* requirements.txt add fa4 whl

* check test on sm100

* fix CI conflict

* add enable_torch_proxy for flash_mask

* lazy import fa4

* check

* fix tests import

* check test_load_mpt import
2026-02-05 14:39:00 +08:00
lizan1999 72edd394d9 [XPU] support noaux_tc (#6326) 2026-02-05 12:04:16 +08:00
YuBaoku cae2709eff [CI] Update stable test workflow and run.sh script (#6352) 2026-02-05 11:01:15 +08:00
GoldPancake 183b8d325a [RL] Support GLM MTP RL Model (#6267) 2026-02-04 20:14:35 +08:00
luukunn 765df94e6c [Optimization]update prompt & prompt_token_ids (#6334)
* fix prompt

* add unit test

* add unit test

* fix
2026-02-04 20:08:01 +08:00
JYChen bf78a48eb3 [Others] add mock unittest for sm100 FP8 inference (#6273)
* add unittest

* use new file

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-04 17:39:15 +08:00
sunxin ef47e6eb46 [Others]skip to_tensor (#6342) 2026-02-04 17:25:19 +08:00
Zhang Yulong 26ba019e66 Update README.md (#6343) 2026-02-04 15:57:18 +08:00
MingkunZhang 43e3886ef9 [Metax][CI] fix run_ci_metax.sh error (#6341) 2026-02-04 15:43:48 +08:00
MingkunZhang e109fb9a0e [Metax][Fix] fix issues based #6259 (#6338) 2026-02-03 23:21:35 -08:00
chenjian 90db0bdd0d [Optimize] Optimize ttft for ep (#6098)
* optimize ttft

* fix

* fix

* fix ci

* fix ci

* fix

* fix bug

* fix

* add comments

* fix ci

* fix
2026-02-04 15:03:29 +08:00
mouxin 6e96bd0bd2 [Feature] Fix counter release logic & update go-router download URL (#6280)
* [Doc] Update prerequisites in the documentation

* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info

* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info

* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info

* [Feature] Fix counter release logic

* [Feature] Update go-router download URL

* [Feature] Update go-router download URL

* [Feature] Update go-router download URL

* [Feature] Update go-router download URL

* [Feature] Update token counter logic and docs

* [Feature] Update token counter logic and docs

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-02-04 15:02:38 +08:00
fxyfxy777 36547cfdb3 [Feature] FD_USE_PHI_FP8_QUANT (#6320)
* add ut

* add use_fd_quant env

* rm mask_per_token_quant

* add make ops list

* USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true

* modify comments

* use bool type

* Add function declaration
2026-02-03 22:33:03 -08:00
MingkunZhang 2ffcb3d9ed [Metax][CI] update ci test files (#6340) 2026-02-04 13:58:07 +08:00
sunxin 9b0a82cfa9 [Model Runner] Support overlap schedule (#6259) 2026-02-04 10:49:44 +08:00