Commit Graph

4590 Commits

Author SHA1 Message Date
0Ayachi0 8bb83b2239 [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 (#6210)
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充

* [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-09 14:00:48 +08:00
Mattheliu c776d483e4 [BugFix]fix handle 4 return values from noaux_tc_redundant op (#6384)
* fix: handle 4 return values from noaux_tc_redundant op

The noaux_tc_redundant CUDA op is defined with 4 outputs in PD_BUILD_STATIC_OP:
- output_tensor (scores)
- topk_values
- topk_indices
- tokens_per_expert_stats_list_out (inplace updated)

The Python code was only unpacking 3 values, causing:
  ValueError: too many values to unpack (expected 3)

This fix correctly unpacks all 4 return values, ignoring the inplace
updated tensor which is the same as the input tokens_per_expert_stats_list.

Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com>

* fix: make noaux_tc_redundant return 4 values to match OP definition

The PD_BUILD_STATIC_OP defines 4 outputs but the function only returned 3,
causing inconsistent behavior across different Paddle framework versions.

This fix explicitly returns 4 values:
- scores (inplace modified)
- topk_values
- topk_indices
- tokens_per_expert_stats_list (inplace modified via atomicAdd)

Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com>

---------

Co-authored-by: Claude (Claude Opus 4.5) <noreply@anthropic.com>
2026-02-09 13:17:47 +08:00
JYChen 9bcd863902 [Others] support import deepgemm/deepep from fleet ops (#6351)
* update paddleformers to v1.0

* only change import fleetpath
2026-02-09 11:53:13 +08:00
xjkmfa 74762b0fb2 [ci case]Prompt logprobs precision (#6381)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2026-02-09 11:42:36 +08:00
周周周 2b4748de4f [MTP] refactor MTP pre_process (#6358) 2026-02-09 10:47:15 +08:00
Jiang-Jia-Jun 18e79dd660 [Metrics] Support cpu-cache-block-num (#6390)
Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com>
2026-02-09 10:27:56 +08:00
MingkunZhang 15e01c6f61 [Metax][CI] add paddleocr ci test (#6379) 2026-02-09 10:11:28 +08:00
Yonghua Li 5ac5ecd0b0 [BugFix] fix cache transfer tasks failure after cache cleared (#6202)
* [fix] fix cache transfer tasks failure after cache cleared

* [fix] fix submit_task

* [fix] fix cache manager hang when clearing prefix cache

* [fix] fix list_proxy has no clear method

* [fix] fix barrier

* [fix] add barrier0

* [fix] add cache_task_is_paused_signal

* [fix] fix condition

* [fix] fix cache transfer  sync and delay prefix cache tree clearing

* [fix] fix typo

* [chore] polish code

* [fix] revert only rank0 write kv_cache_status_signal

* [fix] fix thread pool and prefix cache manager hang

* [fix] add timeout for task_swapping_event

* [fix] tolerate prefix cache manager error while prefix tree is cleared

* [chore] add more log

* [fix] fix test_prefix_cache_manager

* [fix] fix prefix_cache_status_signal usage
2026-02-08 15:33:56 +08:00
jc d6b3c722c1 [KVCache] Storage cache supports c8 model (#6298)
* Refine cache transfer manager
* Storage cache supports c8 model
2026-02-06 12:01:17 +08:00
chen 72fe94cb13 [Feature] support glm tp+dp+ep (#6317) 2026-02-05 21:47:01 +08:00
CSWYF3634076 1c0a2b055f [Feature] console print statistical metrics (#6339)
* [Feature] console print statistical data

* [Feature] console print statistical data v2 dp_rank

* [Feature] console print statistical data v2 unittest

* [Feature] console print statistical data v3 unittest
2026-02-05 19:20:36 +08:00
MingkunZhang de02a909c8 [Metax][CI] restore 21b/28b ci test file (#6368)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-05 18:38:59 +08:00
YuBaoku 5c9bc13a59 [CI] Fix check-bypass.yml 2026-02-05 18:06:39 +08:00
MingkunZhang 6e28b5ef4f [Metax][CI] update metax ci files (#6364) 2026-02-05 17:16:31 +08:00
周周周 e3fb8796b4 Remove MTP rebuil_padding useless code (#6336) 2026-02-05 16:28:44 +08:00
YuBaoku 2d3fb81d29 [CI] Update check-bypass.yml (#6360) 2026-02-05 15:52:30 +08:00
K11OntheBoat 116e2aea7a Support Norm before Rope (#6332)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2026-02-05 15:28:52 +08:00
chen 29a313a402 [Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354)
* support FA4 sm100

* flash attn backend support mask

* flash attn backend run flashmask correct

* add test for flash_attn_backend and flash_attn_func

* check

* add test for fa4

* requirements.txt add fa4 whl

* check test on sm100

* fix CI conflict

* add enable_torch_proxy for flash_mask

* lazy import fa4

* check

* fix tests import

* check test_load_mpt import
2026-02-05 14:39:00 +08:00
lizan1999 72edd394d9 [XPU] support noaux_tc (#6326) 2026-02-05 12:04:16 +08:00
YuBaoku cae2709eff [CI] Update stable test workflow and run.sh script (#6352) 2026-02-05 11:01:15 +08:00
GoldPancake 183b8d325a [RL] Support GLM MTP RL Model (#6267) 2026-02-04 20:14:35 +08:00
luukunn 765df94e6c [Optimization]update prompt & prompt_token_ids (#6334)
* fix prompt

* add unit test

* add unit test

* fix
2026-02-04 20:08:01 +08:00
JYChen bf78a48eb3 [Others] add mock unittest for sm100 FP8 inference (#6273)
* add unittest

* use new file

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-04 17:39:15 +08:00
sunxin ef47e6eb46 [Others]skip to_tensor (#6342) 2026-02-04 17:25:19 +08:00
Zhang Yulong 26ba019e66 Update README.md (#6343) 2026-02-04 15:57:18 +08:00
MingkunZhang 43e3886ef9 [Metax][CI] fix run_ci_metax.sh error (#6341) 2026-02-04 15:43:48 +08:00
MingkunZhang e109fb9a0e [Metax][Fix] fix issues based #6259 (#6338) 2026-02-03 23:21:35 -08:00
chenjian 90db0bdd0d [Optimize] Optimize ttft for ep (#6098)
* optimize ttft

* fix

* fix

* fix ci

* fix ci

* fix

* fix bug

* fix

* add comments

* fix ci

* fix
2026-02-04 15:03:29 +08:00
mouxin 6e96bd0bd2 [Feature] Fix counter release logic & update go-router download URL (#6280)
* [Doc] Update prerequisites in the documentation

* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info

* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info

* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info

* [Feature] Fix counter release logic

* [Feature] Update go-router download URL

* [Feature] Update go-router download URL

* [Feature] Update go-router download URL

* [Feature] Update go-router download URL

* [Feature] Update token counter logic and docs

* [Feature] Update token counter logic and docs

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-02-04 15:02:38 +08:00
fxyfxy777 36547cfdb3 [Feature] FD_USE_PHI_FP8_QUANT (#6320)
* add ut

* add use_fd_quant env

* rm mask_per_token_quant

* add make ops list

* USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true

* modify comments

* use bool type

* Add function declaration
2026-02-03 22:33:03 -08:00
MingkunZhang 2ffcb3d9ed [Metax][CI] update ci test files (#6340) 2026-02-04 13:58:07 +08:00
sunxin 9b0a82cfa9 [Model Runner] Support overlap schedule (#6259) 2026-02-04 10:49:44 +08:00
周周周 6225439778 add PADDLE_ENFORCE (#6321) 2026-02-04 10:47:19 +08:00
xunyoyo 8225e694c9 [CI]【Hackathon 10th Spring No.37】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_wint2_backend.py单测补充 (#6286)
* Add wint2 MoE backend tests

* Align wint2 test dtypes for cutlass apply

* Use bfloat16 input in wint2 test

* Stub moe_expert_reduce in wint2 test

* Use 2 experts in wint2 test

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-04 10:46:26 +08:00
Zhang Yulong 16d03c3127 update (#6335) 2026-02-03 21:59:32 +08:00
Jiang-Jia-Jun 793dac0f9d Modify Nightly Build installation commands for fastdeploy
Update the installation instructions for the Nightly Build of fastdeploy to use the cu126 index for both SM86/89 and SM80/90 architectures.
2026-02-03 20:24:27 +08:00
Jiang-Jia-Jun 829139a5e5 Fix Nightly build installation URLs for fastdeploy-gpu
Updated installation instructions for the latest Nightly build of fastdeploy-gpu to use the correct URLs for CUDA 12.6.
2026-02-03 20:24:19 +08:00
RAM 5b22e5dfe7 [RL] R3 Support Fused Put the Routing of All Layers (#6099)
* fused put routing

* fix bug

* [draft commit]dynamic dtype

* fix async put & numpy bug

* fix unit8 test case
2026-02-03 04:13:16 -08:00
CSWYF3634076 722ca87db6 [Others] lazy write log when writing (#6323) 2026-02-03 20:11:13 +08:00
xiegegege 51c6fa8afc [CE]add 21b cpu cache ,glm mtp,glm for rl config (#6328) 2026-02-03 20:10:47 +08:00
ddchenhao66 faade7d0ab [BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled (#6309) 2026-02-03 19:49:01 +08:00
JYChen c745a22420 [Feature] Support Ernie FP8 on sm100 ( the fixed version) (#6304) 2026-02-03 17:47:38 +08:00
kesmeey 73952a3b67 add tests (#6243)
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-03 17:02:36 +08:00
bukejiyu 12d4b4cb87 [Feature]Support reorder ids to split prefill and decodes (#5779)
* support reorder ids

* perfect code

* fix

* fix unittest

* delete code

* fix

* add python api

* delete custom op

* update algorithm

* fix swap

* support condense

* support condense

* support mtp

* delete code

* update

* update

* update

* update

* update for other platfrom

* update

* fix

* fix mtp

* fix ut

* update

* fix ut

* update ut

* fix

* fix encoder_cache

* fix ci

* fix

* fix vl

* Fix performance regression

* fix

* fix

* fix mtp

* fix index->req_id mapping

* fix ut

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-03 00:28:02 -08:00
周周周 cbdb2462ea cp 1131 tbo to develop (#6281) 2026-02-03 15:23:23 +08:00
周周周 8277b95fa6 remove speculate_get_padding_offset op (#6308) 2026-02-03 15:18:12 +08:00
Moonchild1227 39dc4b0c2e [Feature] [KVCache] support file_store kv cache backend (#6188)
* fix(examples): comment out stop.sh to avoid error when script is missing

* feat: add file_store support for cache manager

* [fix] fix multi gpu transfer

* [fix] fix global kvcache transfer

* [Feature] [KVCache] support file_store kv cache backend

* chore: update FileStore according to PR comments

* fix: remove comments

* fix: add swap_cache_layout for file store

* fix: remove rank key

* fix: Switch KV cache storage to pure file mode

* Temporarily disable support for Tensor types

* fix: remove args --kvcache_file_path & add envs FILE_BACKEND_STORAGE_DIR

* fixx: Simplify cache_transfer_manager.py

* fix: fix syntax bug

* fix: Simplify file_store.py

* fix: Use the key directly as the filename

* fix: Simplify set()

* fix: Simplify cache_transfer_manager.py & file_store.py

* fix: Only support load to cpu buffer

* feat: add FileStore backend for cache transfer

* fix: guard zmq import
2026-02-03 14:37:58 +08:00
zccjjj ee77ff9ebe [config] fix assert message (#6310) 2026-02-03 14:37:46 +08:00
Jingfeng Wu 4760835789 Fix heartbeat signal's sleeptime error (#6241) 2026-02-03 14:28:51 +08:00
xjkmfa e27a7cc5b0 [Benchmark] Ce qwen3 vl (#6288)
* [CE]qwen3-vl
2026-02-03 14:17:28 +08:00