Commit Graph

1765 Commits

Author SHA1 Message Date
YuBaoku 54f7d9f621 [CI] Sync mm_batch_invariant with paddle.mm update (#6557) 2026-02-28 14:56:42 +08:00
Jiang-Jia-Jun 39a5ea66c8 [BugFix] Enable control socket disable option in API server (#6545)
* [BugFix] Enable control socket disable option in API server

* Update requirements.txt

* Update requirements.txt
2026-02-28 10:35:35 +08:00
Weiguo Zhu 8fb24122b8 fix reshard error (#6536) 2026-02-27 22:22:37 +08:00
cmcamdy 13447279aa [XPU] Fix PD + MTP (#6495)
* fix pd + mtp

* fix code style

* fix PD + MTP, D get P's first token

* add anno for gpu(speculate_update)

* update draft insertv1

* fix wapper & kernel

* fix wapper

* fix code stype
2026-02-27 19:07:35 +08:00
JYChen c6d8fbe526 [BugFix] fix log with paddlefleet.ops (#6528) 2026-02-27 14:34:29 +08:00
周周周 1503443871 add dsv3 mixed deploy as EP16 TP8 (#6525) 2026-02-27 14:08:25 +08:00
luukunn 16de778343 update FD_USAGE_STATS_SERVER (#6524) 2026-02-27 13:28:57 +08:00
sunxin 53aaac69da [Optimization] Enable BF16 gate computation for GLM and Qwen (#6457)
* gate bf16

* add gate-fp32

* fix

* update baseline

* update

* update

* fix
2026-02-26 21:08:46 -08:00
gongweibao edd31e8849 [Feature] Add Deterministic Inference Support (#6476)
* add

* [tests] Add Paddle attention determinism tests and refactor resource manager

Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* add

* add

* add

* add

* add more

* add more

* fixsome

* fixsome

* fix bugs

* fix bugs

* only in gpu

* add docs

* fix comments

* fix some

* fix some

* fix comments

* add more

* fix potential problem

* remove not need

* remove not need

* remove no need

* fix bug

* fix bugs

* fix comments

* fix comments

* Update tests/ce/deterministic/test_determinism_verification.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/inter_communicator/test_ipc_signal.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/engine/test_sampling_params_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism_standalone.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix comments

* fix import error

* fix a bug

* fix bugs

* fix bugs

* fix coverage

* refine codes

* refine code

* fix comments

* fix comments

* fix comments

* rm not need

* fix allreduce large tensor bug

* mv log files

* mv log files

* add files

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-26 19:31:51 -08:00
zccjjj c34cb2a8c2 [XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape (#6337) 2026-02-27 09:55:41 +08:00
jc 7b1d787b4b [BugFix] Fix storage_backend_type comparison bug in cache_transfer_manager.py (#6514)
Co-authored-by: root <root@tjzj-inf-sci-k8s-hzz1-h62ni7-2178.tjzj.baidu.com>
2026-02-26 19:32:24 +08:00
MingkunZhang c369f7139f [Metax][Fix] fix error based pr #6493 (#6521) 2026-02-26 18:41:35 +08:00
chen 2d1531f3cb dev opensource model support fa4/flashmasV2/V3 (#6518) 2026-02-26 17:46:05 +08:00
GoldPancake 2178f2829b [Speculative Decoding] Support suffix decoding (#6403)
* support suffix decoding
2026-02-26 11:42:05 +08:00
Yuanle Liu 6d3fede240 [OP][Feature] 统一 limit_thinking_content_length CUDA 算子,支持回复长度限制与注入序列 (#6493)
* Initial plan

* Migrate PRs #6311, #6129, #6305 to develop and merge unit tests

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* fix

* update

* fix

* fix ci

* fix ci

* Initial plan

* test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* test: add disable-thinking case to test_chat_with_response_max_tokens

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* test: add both reasoning_max_tokens and response_max_tokens case

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* fix ci

* fix ci

* fix ci

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
2026-02-25 21:36:50 +08:00
zhupengyang a303eacf62 [XPU] support norm before rope (#6475) 2026-02-25 18:43:44 +08:00
Wanglongzhi2001 14ea7243e1 [Feature] support mm_processor_kwargs for flexible model 2026-02-25 14:34:33 +08:00
jackyYang6 a29ee57e15 [Feature] Support ThinkingBudget Logits processor to control thinking content length (#6367)
* feat: add thinking budget logits processor

* add unittest

* fix pre-commit

* add unittest

* docs: clarify operator-level vs logits processor usage and conflict guidance

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-25 14:17:09 +08:00
Longzhi Wang 22566168c3 [Feature] support qkv&gate linear fusion (#6455)
* [Feature] support qkv&gate linear fusion

* add test
2026-02-24 15:20:29 +08:00
jackyYang6 38c3e02470 fix paddleformers fallback (#6465) 2026-02-23 15:29:13 +08:00
Yonghua Li e2332a1112 [BugFix] fix num_cpu_blocks computation (#6438)
* [BugFix] fix num_cpu_blocks computation

* [fix] fix syntax and log

* [fix] pre-commit

* [fix] use getattr

* [fix] ci test
2026-02-13 11:05:14 +08:00
kevin 52edf5e9b3 fix mtp acceptance rate decline (#6470) 2026-02-12 19:56:10 +08:00
AIbin 0eb87467f8 [BugFix]fix RL bug about blockwisefp8 (#6466)
* fix RL bug about blockwisefp8

* fix  moe same bug

* fix RL FP8 bug
2026-02-12 09:15:29 +08:00
Divano ba3b142ff7 [Others] add objgraph to test out of memory (#6456) 2026-02-11 20:17:20 +08:00
JYChen 40c952e7b5 fix deepgemm import (#6451)
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-02-11 20:10:01 +08:00
zhupengyang 4a8c54926b [XPU] topk_method=noaux_tc (#6355) 2026-02-11 16:12:20 +08:00
CSWYF3634076 7380bfb476 [BugFix]fix console log metrics waitting queue count (#6432)
* [BugFix]fix console log metrics waitting queue count

* [BugFix]fix console log metrics waitting queue count unittest
2026-02-11 10:51:49 +08:00
yzwu 60e75ea8e8 [Iluvatar][CI] Fix cannot import get_stop (#6165) 2026-02-10 16:57:23 +08:00
chen d937d6ebfd check (#6424) 2026-02-10 15:55:17 +08:00
Dangweichong 62ac1e543f [BugFix] Compatibility fix for download feature links (#6276)
* [BugFix] Compatibility fix for download feature links

* add download time log

* remove paddle tensor case
2026-02-10 14:21:08 +08:00
yuxuan 83b4b082ab [BugFix] Fix model loading error for 300B FP8 EP parallel test case (#6382)
* fix fp8 bug

* fix

* fix comment, cn to en

* fix ci

* del else in utils

* fix review
2026-02-10 11:32:57 +08:00
chen a8ffcaa068 fix fa4 test (#6408) 2026-02-10 10:57:21 +08:00
kevin 3ce842b55b [BugFix] add reset shared inputs when update weight dummy run (#6331)
* fix dummy run input bug

* update code

* update code

* update code

* update code
2026-02-10 10:29:03 +08:00
CSWYF3634076 335ab70b1c [Feature] console print metrics add env (#6413) 2026-02-10 09:37:11 +08:00
Jiang-Jia-Jun 4e06df520e [Feature] 统一请求完成日志格式并增强统计信息 (#6405)
将原来分散的两行日志合并为一行,同时增加更多统计信息展示。

主要变更:
- 整合原有的 "Request finished" 和 "token ratio" 两行日志为单行格式
- 新增 InputToken:输入token数量
- 新增 CachedDetail:缓存详情(包含CachedToken/GPU/CPU)
- 新增 OutputToken:输出token数量
- 新增 TTFT:首Token时延(秒)
- 新增 E2E:整句时延(秒)
- 保留 IsPrefill 和 RecoveryStop 标志

新日志格式示例:
Request=chatcmpl-xxx, InputToken=18, CachedDetail={"CachedToken": 0, "GPU": 0, "CPU": 0}, OutputToken=247, TokenRatio=315.77, TTFT=0.02, E2E=0.78, IsPrefill=False, RecoveryStop=False

Co-authored-by: Ducc <ducc@baidu.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 21:06:55 +08:00
bukejiyu 5bfc0938e2 [BugFix] PD reorder fix and add ut (#6375) 2026-02-09 04:42:48 -08:00
CSWYF3634076 ec128068b7 [Others] Exit to ensure no residual processes (cpu cache & dp) (#6377)
* [Others] good exit single dp

* [Others] good exit cpu cache dp>1

* [Others] good exit cpu cache dp>1 unittest
2026-02-09 20:38:38 +08:00
Mattheliu d75b1b8df1 [Fix] Use paddle.device.get_device_properties for multi-platform compatibility (#6400)
Replace paddle.device.cuda.get_device_properties() with paddle.device.get_device_properties()
to support all hardware platforms (NVIDIA, ILUVATAR, HPU, etc.)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 19:15:41 +08:00
chenjian 35c24f3f71 Revert "[Optimize] Optimize ttft for ep (#6098)" (#6402)
This reverts commit 90db0bdd0d.
2026-02-09 19:01:23 +08:00
kevin d60daca4a8 [Feature] consider multimodal model when dummy run (#6045)
* add mm do profile

* updata code

* update code

* update code

* update code

* update test case

* update code

* update code

* fix xpu bug

* update code

* add mm do profile

* update test case

* update code
2026-02-09 17:49:55 +08:00
sunxin 783d56e28a [Optimization] Support logprob async copy (#6362)
* support logprob async copy

* fix prompt logprob

* fix xpu
2026-02-09 17:32:12 +08:00
MingkunZhang 268276e287 [Metax][CI] e2e ci tests enable cuda graph (#6401) 2026-02-09 16:25:23 +08:00
CSWYF3634076 eb8d639fe3 [Engine] apiserver&engine exit when work failed to start (#6322) 2026-02-09 15:07:40 +08:00
bukejiyu dc5917289d [loader]supoort wint2 backend (#6139)
* support wint2

* update
2026-02-08 22:42:36 -08:00
chen f18f3b99ed fix zmq hung when sampled_token_id=0 (#6398) 2026-02-09 14:13:18 +08:00
Mattheliu c776d483e4 [BugFix]fix handle 4 return values from noaux_tc_redundant op (#6384)
* fix: handle 4 return values from noaux_tc_redundant op

The noaux_tc_redundant CUDA op is defined with 4 outputs in PD_BUILD_STATIC_OP:
- output_tensor (scores)
- topk_values
- topk_indices
- tokens_per_expert_stats_list_out (inplace updated)

The Python code was only unpacking 3 values, causing:
  ValueError: too many values to unpack (expected 3)

This fix correctly unpacks all 4 return values, ignoring the inplace
updated tensor which is the same as the input tokens_per_expert_stats_list.

Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com>

* fix: make noaux_tc_redundant return 4 values to match OP definition

The PD_BUILD_STATIC_OP defines 4 outputs but the function only returned 3,
causing inconsistent behavior across different Paddle framework versions.

This fix explicitly returns 4 values:
- scores (inplace modified)
- topk_values
- topk_indices
- tokens_per_expert_stats_list (inplace modified via atomicAdd)

Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com>

---------

Co-authored-by: Claude (Claude Opus 4.5) <noreply@anthropic.com>
2026-02-09 13:17:47 +08:00
JYChen 9bcd863902 [Others] support import deepgemm/deepep from fleet ops (#6351)
* update paddleformers to v1.0

* only change import fleetpath
2026-02-09 11:53:13 +08:00
周周周 2b4748de4f [MTP] refactor MTP pre_process (#6358) 2026-02-09 10:47:15 +08:00
Jiang-Jia-Jun 18e79dd660 [Metrics] Support cpu-cache-block-num (#6390)
Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com>
2026-02-09 10:27:56 +08:00
Yonghua Li 5ac5ecd0b0 [BugFix] fix cache transfer tasks failure after cache cleared (#6202)
* [fix] fix cache transfer tasks failure after cache cleared

* [fix] fix submit_task

* [fix] fix cache manager hang when clearing prefix cache

* [fix] fix list_proxy has no clear method

* [fix] fix barrier

* [fix] add barrier0

* [fix] add cache_task_is_paused_signal

* [fix] fix condition

* [fix] fix cache transfer  sync and delay prefix cache tree clearing

* [fix] fix typo

* [chore] polish code

* [fix] revert only rank0 write kv_cache_status_signal

* [fix] fix thread pool and prefix cache manager hang

* [fix] add timeout for task_swapping_event

* [fix] tolerate prefix cache manager error while prefix tree is cleared

* [chore] add more log

* [fix] fix test_prefix_cache_manager

* [fix] fix prefix_cache_status_signal usage
2026-02-08 15:33:56 +08:00