kesmeey
e4e3a71e7b
[CI] 【Hackathon 10th Spring No.22】功能模块 fastdeploy/cache_manager/cache_transfer_manager.py 单测补充 ( #6157 )
...
* Add comprehensive test coverage for cache_transfer_manager.py
* Fix code style: add newline at end of file
* fix: update cache transfer manager tests for branch 22 interface changes
* fix: resolve test errors for cache transfer manager
* fix: update cache transfer manager tests for branch 22 interface changes
* style: apply pre-commit formatting to tests/cache_manager/test_cache_transfer_manager.py
* Run codestyle: format tests/cache_manager/test_cache_transfer_manager.py and related fixes
* Update test_cache_transfer_manager.py
* Format cache transfer manager tests
* Update cache transfer manager tests
* Update unit test coverage workflow
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-11 11:23:57 +08:00
CSWYF3634076
7380bfb476
[BugFix]fix console log metrics waitting queue count ( #6432 )
...
* [BugFix]fix console log metrics waitting queue count
* [BugFix]fix console log metrics waitting queue count unittest
2026-02-11 10:51:49 +08:00
AIbin
983be007f5
[Feature]support swa & sink Based on appendattn ( #6410 )
...
* support swa & sink Based on appendattn
2026-02-10 18:28:03 +08:00
chen
a8ffcaa068
fix fa4 test ( #6408 )
2026-02-10 10:57:21 +08:00
CSWYF3634076
335ab70b1c
[Feature] console print metrics add env ( #6413 )
2026-02-10 09:37:11 +08:00
YuBaoku
b84056fdaa
[CI] Fix stable_test and add cherry-pick automation ( #6415 )
2026-02-09 23:10:12 +08:00
bukejiyu
5bfc0938e2
[BugFix] PD reorder fix and add ut ( #6375 )
2026-02-09 04:42:48 -08:00
CSWYF3634076
ec128068b7
[Others] Exit to ensure no residual processes (cpu cache & dp) ( #6377 )
...
* [Others] good exit single dp
* [Others] good exit cpu cache dp>1
* [Others] good exit cpu cache dp>1 unittest
2026-02-09 20:38:38 +08:00
chenjian
35c24f3f71
Revert "[Optimize] Optimize ttft for ep ( #6098 )" ( #6402 )
...
This reverts commit 90db0bdd0d .
2026-02-09 19:01:23 +08:00
kevin
d60daca4a8
[Feature] consider multimodal model when dummy run ( #6045 )
...
* add mm do profile
* updata code
* update code
* update code
* update code
* update test case
* update code
* update code
* fix xpu bug
* update code
* add mm do profile
* update test case
* update code
2026-02-09 17:49:55 +08:00
MingkunZhang
268276e287
[Metax][CI] e2e ci tests enable cuda graph ( #6401 )
2026-02-09 16:25:23 +08:00
bukejiyu
dc5917289d
[loader]supoort wint2 backend ( #6139 )
...
* support wint2
* update
2026-02-08 22:42:36 -08:00
0Ayachi0
8bb83b2239
[CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 ( #6210 )
...
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充
* [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-09 14:00:48 +08:00
xjkmfa
74762b0fb2
[ci case]Prompt logprobs precision ( #6381 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
* [ci] prompt_logprobs precision case
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2026-02-09 11:42:36 +08:00
周周周
2b4748de4f
[MTP] refactor MTP pre_process ( #6358 )
2026-02-09 10:47:15 +08:00
MingkunZhang
15e01c6f61
[Metax][CI] add paddleocr ci test ( #6379 )
2026-02-09 10:11:28 +08:00
Yonghua Li
5ac5ecd0b0
[BugFix] fix cache transfer tasks failure after cache cleared ( #6202 )
...
* [fix] fix cache transfer tasks failure after cache cleared
* [fix] fix submit_task
* [fix] fix cache manager hang when clearing prefix cache
* [fix] fix list_proxy has no clear method
* [fix] fix barrier
* [fix] add barrier0
* [fix] add cache_task_is_paused_signal
* [fix] fix condition
* [fix] fix cache transfer sync and delay prefix cache tree clearing
* [fix] fix typo
* [chore] polish code
* [fix] revert only rank0 write kv_cache_status_signal
* [fix] fix thread pool and prefix cache manager hang
* [fix] add timeout for task_swapping_event
* [fix] tolerate prefix cache manager error while prefix tree is cleared
* [chore] add more log
* [fix] fix test_prefix_cache_manager
* [fix] fix prefix_cache_status_signal usage
2026-02-08 15:33:56 +08:00
chen
72fe94cb13
[Feature] support glm tp+dp+ep ( #6317 )
2026-02-05 21:47:01 +08:00
CSWYF3634076
1c0a2b055f
[Feature] console print statistical metrics ( #6339 )
...
* [Feature] console print statistical data
* [Feature] console print statistical data v2 dp_rank
* [Feature] console print statistical data v2 unittest
* [Feature] console print statistical data v3 unittest
2026-02-05 19:20:36 +08:00
MingkunZhang
de02a909c8
[Metax][CI] restore 21b/28b ci test file ( #6368 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-05 18:38:59 +08:00
MingkunZhang
6e28b5ef4f
[Metax][CI] update metax ci files ( #6364 )
2026-02-05 17:16:31 +08:00
chen
29a313a402
[Optimization] Support FA2/FA3/FA4 with attn_mask_q ( #6354 )
...
* support FA4 sm100
* flash attn backend support mask
* flash attn backend run flashmask correct
* add test for flash_attn_backend and flash_attn_func
* check
* add test for fa4
* requirements.txt add fa4 whl
* check test on sm100
* fix CI conflict
* add enable_torch_proxy for flash_mask
* lazy import fa4
* check
* fix tests import
* check test_load_mpt import
2026-02-05 14:39:00 +08:00
YuBaoku
cae2709eff
[CI] Update stable test workflow and run.sh script ( #6352 )
2026-02-05 11:01:15 +08:00
GoldPancake
183b8d325a
[RL] Support GLM MTP RL Model ( #6267 )
2026-02-04 20:14:35 +08:00
luukunn
765df94e6c
[Optimization]update prompt & prompt_token_ids ( #6334 )
...
* fix prompt
* add unit test
* add unit test
* fix
2026-02-04 20:08:01 +08:00
JYChen
bf78a48eb3
[Others] add mock unittest for sm100 FP8 inference ( #6273 )
...
* add unittest
* use new file
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-04 17:39:15 +08:00
chenjian
90db0bdd0d
[Optimize] Optimize ttft for ep ( #6098 )
...
* optimize ttft
* fix
* fix
* fix ci
* fix ci
* fix
* fix bug
* fix
* add comments
* fix ci
* fix
2026-02-04 15:03:29 +08:00
fxyfxy777
36547cfdb3
[Feature] FD_USE_PHI_FP8_QUANT ( #6320 )
...
* add ut
* add use_fd_quant env
* rm mask_per_token_quant
* add make ops list
* USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true
* modify comments
* use bool type
* Add function declaration
2026-02-03 22:33:03 -08:00
MingkunZhang
2ffcb3d9ed
[Metax][CI] update ci test files ( #6340 )
2026-02-04 13:58:07 +08:00
周周周
6225439778
add PADDLE_ENFORCE ( #6321 )
2026-02-04 10:47:19 +08:00
xunyoyo
8225e694c9
[CI]【Hackathon 10th Spring No.37】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_wint2_backend.py单测补充 ( #6286 )
...
* Add wint2 MoE backend tests
* Align wint2 test dtypes for cutlass apply
* Use bfloat16 input in wint2 test
* Stub moe_expert_reduce in wint2 test
* Use 2 experts in wint2 test
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-04 10:46:26 +08:00
RAM
5b22e5dfe7
[RL] R3 Support Fused Put the Routing of All Layers ( #6099 )
...
* fused put routing
* fix bug
* [draft commit]dynamic dtype
* fix async put & numpy bug
* fix unit8 test case
2026-02-03 04:13:16 -08:00
ddchenhao66
faade7d0ab
[BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled ( #6309 )
2026-02-03 19:49:01 +08:00
kesmeey
73952a3b67
add tests ( #6243 )
...
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-03 17:02:36 +08:00
bukejiyu
12d4b4cb87
[Feature]Support reorder ids to split prefill and decodes ( #5779 )
...
* support reorder ids
* perfect code
* fix
* fix unittest
* delete code
* fix
* add python api
* delete custom op
* update algorithm
* fix swap
* support condense
* support condense
* support mtp
* delete code
* update
* update
* update
* update
* update for other platfrom
* update
* fix
* fix mtp
* fix ut
* update
* fix ut
* update ut
* fix
* fix encoder_cache
* fix ci
* fix
* fix vl
* Fix performance regression
* fix
* fix
* fix mtp
* fix index->req_id mapping
* fix ut
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-03 00:28:02 -08:00
周周周
8277b95fa6
remove speculate_get_padding_offset op ( #6308 )
2026-02-03 15:18:12 +08:00
ApplEOFDiscord
6563b8307c
[Bug Fix] fix tokenizer oom ( #6287 )
...
* fix tokenizer oom
* fix unit test
2026-02-03 11:27:11 +08:00
GoldPancake
fb374238e1
Revert "[RL] Support GLM MTP RL Model ( #6223 )" ( #6301 )
...
This reverts commit af6c84d48d .
2026-02-02 14:08:13 +08:00
fxyfxy777
2ada119a38
[Optimize] optimize mask_quant & swiglu ( #6222 )
...
* optimize mask_quant op speed up 1.5
* fix calculate sequence
* add fused
* rm log
* push kernel code
* add ut
* accuracy ok
* add ue8m0
* add ut
* add merge develop
* rm ut of mask_per_token_quant
2026-02-02 13:52:38 +08:00
xunyoyo
25656455ee
[CI] 【Hackathon 10th Spring No.38】功能模块 fastdeploy/entrypoints/openai/serving_completion.py单测补充 ( #6227 )
...
* Add serving completion tests
* test: tighten serving completion coverage
2026-02-02 12:53:04 +08:00
kesmeey
afee0b9c5e
[CI] 【Hackathon 10th Spring No.30】功能模块 fastdeploy/inter_communicator/engine_worker_queue.py单测补充 ( #6102 )
...
* test: add comprehensive tests for EngineWorkerQueue to improve code coverage
* style: format tests/inter_communicator/test_e2w_queue.py with black
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-01-30 21:37:29 +08:00
xunyoyo
18ebce9dec
[CI] 【Hackathon 10th Spring No.41】功能模块 fastdeploy/entrypoints/llm.py 单测补充 ( #6108 )
...
* Add LLM entrypoint tests for coverage
* test: streamline llm entrypoint coverage
* test: format llm tests
2026-01-30 12:58:10 +08:00
JYChen
6c685c9474
Revert "[Feature] Support Ernie FP8 on sm100 ( #5593 )" ( #6275 )
...
This reverts commit eb80724b71 .
2026-01-30 11:22:01 +08:00
chenjian
292bab7e6d
[BugFix] Fix bug for enable output caching ( #6226 )
...
* [BugFix] Fix bug for enable output caching
* fix
* Fix
* fix
* fix ci
2026-01-30 10:55:36 +08:00
周周周
e237313797
[BugFix] allow return code 250 in tests/distributed/test_fusedmoe_ep_entry.py ( #6269 )
2026-01-29 16:00:03 +08:00
yuxuan
44b52701f6
[Feature] Support NVFP4 MoE on SM100 ( #6003 )
...
* fp4 dense
* [WIP] support nvfp4, dense part
* [wip] developing loading qwen model
* loading
* update
* dense fp4 OK, cudagraph error
* [WIP] moe forward part
* with flashinfer-backend
* qwen3_moe_fp4
* update
* support flashinfer-cutlass moe, qwen3-moe-fp4 OK
* support ernie4.5-fp4
* fix load error
* add some ut
* add docs
* fix CLA, test
* fix the apply() in ModelOptNvFp4FusedMoE
* fix CodeStyle
* del the PADDLE_COMPATIBLE_API
* fix broken url: nvidia_gpu.md
* fix docs
* fix token_ids
* fix CI in Hopper
* move flashinfer imports inside the function
* fix model_runner
Removed the logic for generating random padding IDs.
* Remove skip condition for CUDA version in nvfp4 test
* add test for nvfp4
* fix according to review
* Add Chinese translation link to NVFP4 documentation
* del flashinfer.py
* fix unittest
---------
Co-authored-by: zoooo0820 <zoooo0820@qq.com >
Co-authored-by: bukejiyu <395822456@qq.com >
2026-01-29 14:16:07 +08:00
JYChen
eb80724b71
[Feature] Support Ernie FP8 on sm100 ( #5593 )
...
* Deepgemm暂时可用版本
* dense部分 e8m0 ok
* EB模型E8M0跑通的版本
* code check
* support 21b-tp2, dev_paddle
* 单机4.5T ep OK的版本
* 修复删除的代码,单机4.5T ep(非cudagraph)
* eb tp
* Support SM100 block-wise FP8 inference
* refine codes, support deepgemm on sm100
* add thirdparty PFCC/DeepGEMM
* fix ep decode
* 使用deepep ue8m0, 解决精度问题
* 修复FP8 TP精度
* Deepgemm升级适配Hopper逻辑
* add ue8m0 kernel
* add ue8m0 kernel
* fix custom_ops/gpu_ops/cpp_extensions.cc
* eb 输出正常
* eb5 text is right
* 目测精度一致
* 自测精度对齐
* 替换masked_per_token_quant, ep精度OK
* 性能提升约30%
* 暂时跑通ep但是有问题
* 自测一致
* rm test fun
* fix ep event
* 图优化算子更新Deepgemm
* fix build
* 暂时绕过deepgemm CI编译问题
* 根据SM区分deepgemm版本
* remove useless code
---------
Co-authored-by: ckl117 <ckl117@163.com >
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
Co-authored-by: fxyfxy777 <fxyfxy777@163.com >
2026-01-29 13:49:54 +08:00
GoldPancake
af6c84d48d
[RL] Support GLM MTP RL Model ( #6223 )
...
* support glm mtp rl model
* fix
* fix
* fix ut
* update baseline
2026-01-28 08:28:03 -08:00
jc
7da5f54fb3
[CI] Add unit test for swap_layout && remove unit test of splitwise_scheduler ( #6250 )
...
* Add unit test for swap_layout
* remove splitwise_scheduler test
2026-01-28 19:20:20 +08:00
ddchenhao66
6d33d5e370
[Models][BugFix] shared experts and dense mlp layer do not require TP split ( #6180 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-28 18:58:19 +08:00