0Ayachi0
8bb83b2239
[CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 ( #6210 )
...
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充
* [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-09 14:00:48 +08:00
Mattheliu
c776d483e4
[BugFix]fix handle 4 return values from noaux_tc_redundant op ( #6384 )
...
* fix: handle 4 return values from noaux_tc_redundant op
The noaux_tc_redundant CUDA op is defined with 4 outputs in PD_BUILD_STATIC_OP:
- output_tensor (scores)
- topk_values
- topk_indices
- tokens_per_expert_stats_list_out (inplace updated)
The Python code was only unpacking 3 values, causing:
ValueError: too many values to unpack (expected 3)
This fix correctly unpacks all 4 return values, ignoring the inplace
updated tensor which is the same as the input tokens_per_expert_stats_list.
Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com >
* fix: make noaux_tc_redundant return 4 values to match OP definition
The PD_BUILD_STATIC_OP defines 4 outputs but the function only returned 3,
causing inconsistent behavior across different Paddle framework versions.
This fix explicitly returns 4 values:
- scores (inplace modified)
- topk_values
- topk_indices
- tokens_per_expert_stats_list (inplace modified via atomicAdd)
Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com >
---------
Co-authored-by: Claude (Claude Opus 4.5) <noreply@anthropic.com >
2026-02-09 13:17:47 +08:00
JYChen
9bcd863902
[Others] support import deepgemm/deepep from fleet ops ( #6351 )
...
* update paddleformers to v1.0
* only change import fleetpath
2026-02-09 11:53:13 +08:00
xjkmfa
74762b0fb2
[ci case]Prompt logprobs precision ( #6381 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2026-02-09 11:42:36 +08:00
周周周
2b4748de4f
[MTP] refactor MTP pre_process ( #6358 )
2026-02-09 10:47:15 +08:00
Jiang-Jia-Jun
18e79dd660
[Metrics] Support cpu-cache-block-num ( #6390 )
...
Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com >
2026-02-09 10:27:56 +08:00
MingkunZhang
15e01c6f61
[Metax][CI] add paddleocr ci test ( #6379 )
2026-02-09 10:11:28 +08:00
Yonghua Li
5ac5ecd0b0
[BugFix] fix cache transfer tasks failure after cache cleared ( #6202 )
...
* [fix] fix cache transfer tasks failure after cache cleared
* [fix] fix submit_task
* [fix] fix cache manager hang when clearing prefix cache
* [fix] fix list_proxy has no clear method
* [fix] fix barrier
* [fix] add barrier0
* [fix] add cache_task_is_paused_signal
* [fix] fix condition
* [fix] fix cache transfer sync and delay prefix cache tree clearing
* [fix] fix typo
* [chore] polish code
* [fix] revert only rank0 write kv_cache_status_signal
* [fix] fix thread pool and prefix cache manager hang
* [fix] add timeout for task_swapping_event
* [fix] tolerate prefix cache manager error while prefix tree is cleared
* [chore] add more log
* [fix] fix test_prefix_cache_manager
* [fix] fix prefix_cache_status_signal usage
2026-02-08 15:33:56 +08:00
jc
d6b3c722c1
[KVCache] Storage cache supports c8 model ( #6298 )
...
* Refine cache transfer manager
* Storage cache supports c8 model
2026-02-06 12:01:17 +08:00
chen
72fe94cb13
[Feature] support glm tp+dp+ep ( #6317 )
2026-02-05 21:47:01 +08:00
CSWYF3634076
1c0a2b055f
[Feature] console print statistical metrics ( #6339 )
...
* [Feature] console print statistical data
* [Feature] console print statistical data v2 dp_rank
* [Feature] console print statistical data v2 unittest
* [Feature] console print statistical data v3 unittest
2026-02-05 19:20:36 +08:00
MingkunZhang
de02a909c8
[Metax][CI] restore 21b/28b ci test file ( #6368 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-05 18:38:59 +08:00
YuBaoku
5c9bc13a59
[CI] Fix check-bypass.yml
2026-02-05 18:06:39 +08:00
MingkunZhang
6e28b5ef4f
[Metax][CI] update metax ci files ( #6364 )
2026-02-05 17:16:31 +08:00
周周周
e3fb8796b4
Remove MTP rebuil_padding useless code ( #6336 )
2026-02-05 16:28:44 +08:00
YuBaoku
2d3fb81d29
[CI] Update check-bypass.yml ( #6360 )
2026-02-05 15:52:30 +08:00
K11OntheBoat
116e2aea7a
Support Norm before Rope ( #6332 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2026-02-05 15:28:52 +08:00
chen
29a313a402
[Optimization] Support FA2/FA3/FA4 with attn_mask_q ( #6354 )
...
* support FA4 sm100
* flash attn backend support mask
* flash attn backend run flashmask correct
* add test for flash_attn_backend and flash_attn_func
* check
* add test for fa4
* requirements.txt add fa4 whl
* check test on sm100
* fix CI conflict
* add enable_torch_proxy for flash_mask
* lazy import fa4
* check
* fix tests import
* check test_load_mpt import
2026-02-05 14:39:00 +08:00
lizan1999
72edd394d9
[XPU] support noaux_tc ( #6326 )
2026-02-05 12:04:16 +08:00
YuBaoku
cae2709eff
[CI] Update stable test workflow and run.sh script ( #6352 )
2026-02-05 11:01:15 +08:00
GoldPancake
183b8d325a
[RL] Support GLM MTP RL Model ( #6267 )
2026-02-04 20:14:35 +08:00
luukunn
765df94e6c
[Optimization]update prompt & prompt_token_ids ( #6334 )
...
* fix prompt
* add unit test
* add unit test
* fix
2026-02-04 20:08:01 +08:00
JYChen
bf78a48eb3
[Others] add mock unittest for sm100 FP8 inference ( #6273 )
...
* add unittest
* use new file
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-04 17:39:15 +08:00
sunxin
ef47e6eb46
[Others]skip to_tensor ( #6342 )
2026-02-04 17:25:19 +08:00
Zhang Yulong
26ba019e66
Update README.md ( #6343 )
2026-02-04 15:57:18 +08:00
MingkunZhang
43e3886ef9
[Metax][CI] fix run_ci_metax.sh error ( #6341 )
2026-02-04 15:43:48 +08:00
MingkunZhang
e109fb9a0e
[Metax][Fix] fix issues based #6259 ( #6338 )
2026-02-03 23:21:35 -08:00
chenjian
90db0bdd0d
[Optimize] Optimize ttft for ep ( #6098 )
...
* optimize ttft
* fix
* fix
* fix ci
* fix ci
* fix
* fix bug
* fix
* add comments
* fix ci
* fix
2026-02-04 15:03:29 +08:00
mouxin
6e96bd0bd2
[Feature] Fix counter release logic & update go-router download URL ( #6280 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Fix counter release logic
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update token counter logic and docs
* [Feature] Update token counter logic and docs
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-02-04 15:02:38 +08:00
fxyfxy777
36547cfdb3
[Feature] FD_USE_PHI_FP8_QUANT ( #6320 )
...
* add ut
* add use_fd_quant env
* rm mask_per_token_quant
* add make ops list
* USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true
* modify comments
* use bool type
* Add function declaration
2026-02-03 22:33:03 -08:00
MingkunZhang
2ffcb3d9ed
[Metax][CI] update ci test files ( #6340 )
2026-02-04 13:58:07 +08:00
sunxin
9b0a82cfa9
[Model Runner] Support overlap schedule ( #6259 )
2026-02-04 10:49:44 +08:00
周周周
6225439778
add PADDLE_ENFORCE ( #6321 )
2026-02-04 10:47:19 +08:00
xunyoyo
8225e694c9
[CI]【Hackathon 10th Spring No.37】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_wint2_backend.py单测补充 ( #6286 )
...
* Add wint2 MoE backend tests
* Align wint2 test dtypes for cutlass apply
* Use bfloat16 input in wint2 test
* Stub moe_expert_reduce in wint2 test
* Use 2 experts in wint2 test
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-04 10:46:26 +08:00
Zhang Yulong
16d03c3127
update ( #6335 )
2026-02-03 21:59:32 +08:00
Jiang-Jia-Jun
793dac0f9d
Modify Nightly Build installation commands for fastdeploy
...
Update the installation instructions for the Nightly Build of fastdeploy to use the cu126 index for both SM86/89 and SM80/90 architectures.
2026-02-03 20:24:27 +08:00
Jiang-Jia-Jun
829139a5e5
Fix Nightly build installation URLs for fastdeploy-gpu
...
Updated installation instructions for the latest Nightly build of fastdeploy-gpu to use the correct URLs for CUDA 12.6.
2026-02-03 20:24:19 +08:00
RAM
5b22e5dfe7
[RL] R3 Support Fused Put the Routing of All Layers ( #6099 )
...
* fused put routing
* fix bug
* [draft commit]dynamic dtype
* fix async put & numpy bug
* fix unit8 test case
2026-02-03 04:13:16 -08:00
CSWYF3634076
722ca87db6
[Others] lazy write log when writing ( #6323 )
2026-02-03 20:11:13 +08:00
xiegegege
51c6fa8afc
[CE]add 21b cpu cache ,glm mtp,glm for rl config ( #6328 )
2026-02-03 20:10:47 +08:00
ddchenhao66
faade7d0ab
[BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled ( #6309 )
2026-02-03 19:49:01 +08:00
JYChen
c745a22420
[Feature] Support Ernie FP8 on sm100 ( the fixed version) ( #6304 )
2026-02-03 17:47:38 +08:00
kesmeey
73952a3b67
add tests ( #6243 )
...
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-03 17:02:36 +08:00
bukejiyu
12d4b4cb87
[Feature]Support reorder ids to split prefill and decodes ( #5779 )
...
* support reorder ids
* perfect code
* fix
* fix unittest
* delete code
* fix
* add python api
* delete custom op
* update algorithm
* fix swap
* support condense
* support condense
* support mtp
* delete code
* update
* update
* update
* update
* update for other platfrom
* update
* fix
* fix mtp
* fix ut
* update
* fix ut
* update ut
* fix
* fix encoder_cache
* fix ci
* fix
* fix vl
* Fix performance regression
* fix
* fix
* fix mtp
* fix index->req_id mapping
* fix ut
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-03 00:28:02 -08:00
周周周
cbdb2462ea
cp 1131 tbo to develop ( #6281 )
2026-02-03 15:23:23 +08:00
周周周
8277b95fa6
remove speculate_get_padding_offset op ( #6308 )
2026-02-03 15:18:12 +08:00
Moonchild1227
39dc4b0c2e
[Feature] [KVCache] support file_store kv cache backend ( #6188 )
...
* fix(examples): comment out stop.sh to avoid error when script is missing
* feat: add file_store support for cache manager
* [fix] fix multi gpu transfer
* [fix] fix global kvcache transfer
* [Feature] [KVCache] support file_store kv cache backend
* chore: update FileStore according to PR comments
* fix: remove comments
* fix: add swap_cache_layout for file store
* fix: remove rank key
* fix: Switch KV cache storage to pure file mode
* Temporarily disable support for Tensor types
* fix: remove args --kvcache_file_path & add envs FILE_BACKEND_STORAGE_DIR
* fixx: Simplify cache_transfer_manager.py
* fix: fix syntax bug
* fix: Simplify file_store.py
* fix: Use the key directly as the filename
* fix: Simplify set()
* fix: Simplify cache_transfer_manager.py & file_store.py
* fix: Only support load to cpu buffer
* feat: add FileStore backend for cache transfer
* fix: guard zmq import
2026-02-03 14:37:58 +08:00
zccjjj
ee77ff9ebe
[config] fix assert message ( #6310 )
2026-02-03 14:37:46 +08:00
Jingfeng Wu
4760835789
Fix heartbeat signal's sleeptime error ( #6241 )
2026-02-03 14:28:51 +08:00
xjkmfa
e27a7cc5b0
[Benchmark] Ce qwen3 vl ( #6288 )
...
* [CE]qwen3-vl
2026-02-03 14:17:28 +08:00