Commit Graph

870 Commits

Author SHA1 Message Date
kesmeey e4e3a71e7b [CI] 【Hackathon 10th Spring No.22】功能模块 fastdeploy/cache_manager/cache_transfer_manager.py 单测补充 (#6157)
* Add comprehensive test coverage for cache_transfer_manager.py

* Fix code style: add newline at end of file

* fix: update cache transfer manager tests for branch 22 interface changes

* fix: resolve test errors for cache transfer manager

* fix: update cache transfer manager tests for branch 22 interface changes

* style: apply pre-commit formatting to tests/cache_manager/test_cache_transfer_manager.py

* Run codestyle: format tests/cache_manager/test_cache_transfer_manager.py and related fixes

* Update test_cache_transfer_manager.py

* Format cache transfer manager tests

* Update cache transfer manager tests

* Update unit test coverage workflow

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-11 11:23:57 +08:00
CSWYF3634076 7380bfb476 [BugFix]fix console log metrics waitting queue count (#6432)
* [BugFix]fix console log metrics waitting queue count

* [BugFix]fix console log metrics waitting queue count unittest
2026-02-11 10:51:49 +08:00
AIbin 983be007f5 [Feature]support swa & sink Based on appendattn (#6410)
* support swa & sink Based on  appendattn
2026-02-10 18:28:03 +08:00
chen a8ffcaa068 fix fa4 test (#6408) 2026-02-10 10:57:21 +08:00
CSWYF3634076 335ab70b1c [Feature] console print metrics add env (#6413) 2026-02-10 09:37:11 +08:00
YuBaoku b84056fdaa [CI] Fix stable_test and add cherry-pick automation (#6415) 2026-02-09 23:10:12 +08:00
bukejiyu 5bfc0938e2 [BugFix] PD reorder fix and add ut (#6375) 2026-02-09 04:42:48 -08:00
CSWYF3634076 ec128068b7 [Others] Exit to ensure no residual processes (cpu cache & dp) (#6377)
* [Others] good exit single dp

* [Others] good exit cpu cache dp>1

* [Others] good exit cpu cache dp>1 unittest
2026-02-09 20:38:38 +08:00
chenjian 35c24f3f71 Revert "[Optimize] Optimize ttft for ep (#6098)" (#6402)
This reverts commit 90db0bdd0d.
2026-02-09 19:01:23 +08:00
kevin d60daca4a8 [Feature] consider multimodal model when dummy run (#6045)
* add mm do profile

* updata code

* update code

* update code

* update code

* update test case

* update code

* update code

* fix xpu bug

* update code

* add mm do profile

* update test case

* update code
2026-02-09 17:49:55 +08:00
MingkunZhang 268276e287 [Metax][CI] e2e ci tests enable cuda graph (#6401) 2026-02-09 16:25:23 +08:00
bukejiyu dc5917289d [loader]supoort wint2 backend (#6139)
* support wint2

* update
2026-02-08 22:42:36 -08:00
0Ayachi0 8bb83b2239 [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 (#6210)
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充

* [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-09 14:00:48 +08:00
xjkmfa 74762b0fb2 [ci case]Prompt logprobs precision (#6381)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

* [ci] prompt_logprobs precision case

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2026-02-09 11:42:36 +08:00
周周周 2b4748de4f [MTP] refactor MTP pre_process (#6358) 2026-02-09 10:47:15 +08:00
MingkunZhang 15e01c6f61 [Metax][CI] add paddleocr ci test (#6379) 2026-02-09 10:11:28 +08:00
Yonghua Li 5ac5ecd0b0 [BugFix] fix cache transfer tasks failure after cache cleared (#6202)
* [fix] fix cache transfer tasks failure after cache cleared

* [fix] fix submit_task

* [fix] fix cache manager hang when clearing prefix cache

* [fix] fix list_proxy has no clear method

* [fix] fix barrier

* [fix] add barrier0

* [fix] add cache_task_is_paused_signal

* [fix] fix condition

* [fix] fix cache transfer  sync and delay prefix cache tree clearing

* [fix] fix typo

* [chore] polish code

* [fix] revert only rank0 write kv_cache_status_signal

* [fix] fix thread pool and prefix cache manager hang

* [fix] add timeout for task_swapping_event

* [fix] tolerate prefix cache manager error while prefix tree is cleared

* [chore] add more log

* [fix] fix test_prefix_cache_manager

* [fix] fix prefix_cache_status_signal usage
2026-02-08 15:33:56 +08:00
chen 72fe94cb13 [Feature] support glm tp+dp+ep (#6317) 2026-02-05 21:47:01 +08:00
CSWYF3634076 1c0a2b055f [Feature] console print statistical metrics (#6339)
* [Feature] console print statistical data

* [Feature] console print statistical data v2 dp_rank

* [Feature] console print statistical data v2 unittest

* [Feature] console print statistical data v3 unittest
2026-02-05 19:20:36 +08:00
MingkunZhang de02a909c8 [Metax][CI] restore 21b/28b ci test file (#6368)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-05 18:38:59 +08:00
MingkunZhang 6e28b5ef4f [Metax][CI] update metax ci files (#6364) 2026-02-05 17:16:31 +08:00
chen 29a313a402 [Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354)
* support FA4 sm100

* flash attn backend support mask

* flash attn backend run flashmask correct

* add test for flash_attn_backend and flash_attn_func

* check

* add test for fa4

* requirements.txt add fa4 whl

* check test on sm100

* fix CI conflict

* add enable_torch_proxy for flash_mask

* lazy import fa4

* check

* fix tests import

* check test_load_mpt import
2026-02-05 14:39:00 +08:00
YuBaoku cae2709eff [CI] Update stable test workflow and run.sh script (#6352) 2026-02-05 11:01:15 +08:00
GoldPancake 183b8d325a [RL] Support GLM MTP RL Model (#6267) 2026-02-04 20:14:35 +08:00
luukunn 765df94e6c [Optimization]update prompt & prompt_token_ids (#6334)
* fix prompt

* add unit test

* add unit test

* fix
2026-02-04 20:08:01 +08:00
JYChen bf78a48eb3 [Others] add mock unittest for sm100 FP8 inference (#6273)
* add unittest

* use new file

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-04 17:39:15 +08:00
chenjian 90db0bdd0d [Optimize] Optimize ttft for ep (#6098)
* optimize ttft

* fix

* fix

* fix ci

* fix ci

* fix

* fix bug

* fix

* add comments

* fix ci

* fix
2026-02-04 15:03:29 +08:00
fxyfxy777 36547cfdb3 [Feature] FD_USE_PHI_FP8_QUANT (#6320)
* add ut

* add use_fd_quant env

* rm mask_per_token_quant

* add make ops list

* USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true

* modify comments

* use bool type

* Add function declaration
2026-02-03 22:33:03 -08:00
MingkunZhang 2ffcb3d9ed [Metax][CI] update ci test files (#6340) 2026-02-04 13:58:07 +08:00
周周周 6225439778 add PADDLE_ENFORCE (#6321) 2026-02-04 10:47:19 +08:00
xunyoyo 8225e694c9 [CI]【Hackathon 10th Spring No.37】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_wint2_backend.py单测补充 (#6286)
* Add wint2 MoE backend tests

* Align wint2 test dtypes for cutlass apply

* Use bfloat16 input in wint2 test

* Stub moe_expert_reduce in wint2 test

* Use 2 experts in wint2 test

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-04 10:46:26 +08:00
RAM 5b22e5dfe7 [RL] R3 Support Fused Put the Routing of All Layers (#6099)
* fused put routing

* fix bug

* [draft commit]dynamic dtype

* fix async put & numpy bug

* fix unit8 test case
2026-02-03 04:13:16 -08:00
ddchenhao66 faade7d0ab [BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled (#6309) 2026-02-03 19:49:01 +08:00
kesmeey 73952a3b67 add tests (#6243)
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-03 17:02:36 +08:00
bukejiyu 12d4b4cb87 [Feature]Support reorder ids to split prefill and decodes (#5779)
* support reorder ids

* perfect code

* fix

* fix unittest

* delete code

* fix

* add python api

* delete custom op

* update algorithm

* fix swap

* support condense

* support condense

* support mtp

* delete code

* update

* update

* update

* update

* update for other platfrom

* update

* fix

* fix mtp

* fix ut

* update

* fix ut

* update ut

* fix

* fix encoder_cache

* fix ci

* fix

* fix vl

* Fix performance regression

* fix

* fix

* fix mtp

* fix index->req_id mapping

* fix ut

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-03 00:28:02 -08:00
周周周 8277b95fa6 remove speculate_get_padding_offset op (#6308) 2026-02-03 15:18:12 +08:00
ApplEOFDiscord 6563b8307c [Bug Fix] fix tokenizer oom (#6287)
* fix tokenizer oom

* fix unit test
2026-02-03 11:27:11 +08:00
GoldPancake fb374238e1 Revert "[RL] Support GLM MTP RL Model (#6223)" (#6301)
This reverts commit af6c84d48d.
2026-02-02 14:08:13 +08:00
fxyfxy777 2ada119a38 [Optimize] optimize mask_quant & swiglu (#6222)
* optimize mask_quant op speed up 1.5

* fix calculate sequence

* add fused

* rm log

* push kernel code

* add ut

* accuracy ok

* add ue8m0

* add ut

* add merge develop

* rm ut of mask_per_token_quant
2026-02-02 13:52:38 +08:00
xunyoyo 25656455ee [CI] 【Hackathon 10th Spring No.38】功能模块 fastdeploy/entrypoints/openai/serving_completion.py单测补充 (#6227)
* Add serving completion tests

* test: tighten serving completion coverage
2026-02-02 12:53:04 +08:00
kesmeey afee0b9c5e [CI] 【Hackathon 10th Spring No.30】功能模块 fastdeploy/inter_communicator/engine_worker_queue.py单测补充 (#6102)
* test: add comprehensive tests for EngineWorkerQueue to improve code coverage

* style: format tests/inter_communicator/test_e2w_queue.py with black

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-01-30 21:37:29 +08:00
xunyoyo 18ebce9dec [CI] 【Hackathon 10th Spring No.41】功能模块 fastdeploy/entrypoints/llm.py 单测补充 (#6108)
* Add LLM entrypoint tests for coverage

* test: streamline llm entrypoint coverage

* test: format llm tests
2026-01-30 12:58:10 +08:00
JYChen 6c685c9474 Revert "[Feature] Support Ernie FP8 on sm100 (#5593)" (#6275)
This reverts commit eb80724b71.
2026-01-30 11:22:01 +08:00
chenjian 292bab7e6d [BugFix] Fix bug for enable output caching (#6226)
* [BugFix] Fix bug for enable output caching

* fix

* Fix

* fix

* fix ci
2026-01-30 10:55:36 +08:00
周周周 e237313797 [BugFix] allow return code 250 in tests/distributed/test_fusedmoe_ep_entry.py (#6269) 2026-01-29 16:00:03 +08:00
yuxuan 44b52701f6 [Feature] Support NVFP4 MoE on SM100 (#6003)
* fp4 dense

* [WIP] support nvfp4, dense part

* [wip] developing loading qwen model

* loading

* update

* dense fp4 OK, cudagraph error

* [WIP] moe forward part

* with flashinfer-backend

* qwen3_moe_fp4

* update

* support flashinfer-cutlass moe, qwen3-moe-fp4 OK

* support ernie4.5-fp4

* fix load error

* add some ut

* add docs

* fix CLA, test

* fix the apply() in ModelOptNvFp4FusedMoE

* fix CodeStyle

* del the PADDLE_COMPATIBLE_API

* fix broken url: nvidia_gpu.md

* fix docs

* fix token_ids

* fix CI in Hopper

* move flashinfer imports inside the function

* fix model_runner

Removed the logic for generating random padding IDs.

* Remove skip condition for CUDA version in nvfp4 test

* add test for nvfp4

* fix according to review

* Add Chinese translation link to NVFP4 documentation

* del flashinfer.py

* fix unittest

---------

Co-authored-by: zoooo0820 <zoooo0820@qq.com>
Co-authored-by: bukejiyu <395822456@qq.com>
2026-01-29 14:16:07 +08:00
JYChen eb80724b71 [Feature] Support Ernie FP8 on sm100 (#5593)
* Deepgemm暂时可用版本

* dense部分 e8m0 ok

* EB模型E8M0跑通的版本

* code check

* support 21b-tp2, dev_paddle

* 单机4.5T ep OK的版本

* 修复删除的代码,单机4.5T ep(非cudagraph)

* eb tp

* Support SM100 block-wise FP8 inference

* refine codes, support deepgemm on sm100

* add thirdparty PFCC/DeepGEMM

* fix ep decode

* 使用deepep ue8m0, 解决精度问题

* 修复FP8 TP精度

* Deepgemm升级适配Hopper逻辑

* add ue8m0 kernel

* add ue8m0 kernel

* fix custom_ops/gpu_ops/cpp_extensions.cc

* eb 输出正常

* eb5 text is right

* 目测精度一致

* 自测精度对齐

* 替换masked_per_token_quant, ep精度OK

* 性能提升约30%

* 暂时跑通ep但是有问题

* 自测一致

* rm test fun

* fix ep event

* 图优化算子更新Deepgemm

* fix build

* 暂时绕过deepgemm CI编译问题

* 根据SM区分deepgemm版本

* remove useless code

---------

Co-authored-by: ckl117 <ckl117@163.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: fxyfxy777 <fxyfxy777@163.com>
2026-01-29 13:49:54 +08:00
GoldPancake af6c84d48d [RL] Support GLM MTP RL Model (#6223)
* support glm mtp rl model

* fix

* fix

* fix ut

* update baseline
2026-01-28 08:28:03 -08:00
jc 7da5f54fb3 [CI] Add unit test for swap_layout && remove unit test of splitwise_scheduler (#6250)
* Add unit test for swap_layout

* remove splitwise_scheduler test
2026-01-28 19:20:20 +08:00
ddchenhao66 6d33d5e370 [Models][BugFix] shared experts and dense mlp layer do not require TP split (#6180)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-28 18:58:19 +08:00