FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
kesmeey	e4e3a71e7b	[CI] 【Hackathon 10th Spring No.22】功能模块 fastdeploy/cache_manager/cache_transfer_manager.py 单测补充 (#6157 ) * Add comprehensive test coverage for cache_transfer_manager.py * Fix code style: add newline at end of file * fix: update cache transfer manager tests for branch 22 interface changes * fix: resolve test errors for cache transfer manager * fix: update cache transfer manager tests for branch 22 interface changes * style: apply pre-commit formatting to tests/cache_manager/test_cache_transfer_manager.py * Run codestyle: format tests/cache_manager/test_cache_transfer_manager.py and related fixes * Update test_cache_transfer_manager.py * Format cache transfer manager tests * Update cache transfer manager tests * Update unit test coverage workflow --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-11 11:23:57 +08:00
CSWYF3634076	7380bfb476	[BugFix]fix console log metrics waitting queue count (#6432 ) * [BugFix]fix console log metrics waitting queue count * [BugFix]fix console log metrics waitting queue count unittest	2026-02-11 10:51:49 +08:00
AIbin	983be007f5	[Feature]support swa & sink Based on appendattn (#6410 ) * support swa & sink Based on appendattn	2026-02-10 18:28:03 +08:00
chen	a8ffcaa068	fix fa4 test (#6408 )	2026-02-10 10:57:21 +08:00
CSWYF3634076	335ab70b1c	[Feature] console print metrics add env (#6413 )	2026-02-10 09:37:11 +08:00
YuBaoku	b84056fdaa	[CI] Fix stable_test and add cherry-pick automation (#6415 )	2026-02-09 23:10:12 +08:00
bukejiyu	5bfc0938e2	[BugFix] PD reorder fix and add ut (#6375 )	2026-02-09 04:42:48 -08:00
CSWYF3634076	ec128068b7	[Others] Exit to ensure no residual processes (cpu cache & dp) (#6377 ) * [Others] good exit single dp * [Others] good exit cpu cache dp>1 * [Others] good exit cpu cache dp>1 unittest	2026-02-09 20:38:38 +08:00
chenjian	35c24f3f71	Revert "[Optimize] Optimize ttft for ep (#6098 )" (#6402 ) This reverts commit `90db0bdd0d`.	2026-02-09 19:01:23 +08:00
kevin	d60daca4a8	[Feature] consider multimodal model when dummy run (#6045 ) * add mm do profile * updata code * update code * update code * update code * update test case * update code * update code * fix xpu bug * update code * add mm do profile * update test case * update code	2026-02-09 17:49:55 +08:00
MingkunZhang	268276e287	[Metax][CI] e2e ci tests enable cuda graph (#6401 )	2026-02-09 16:25:23 +08:00
bukejiyu	dc5917289d	[loader]supoort wint2 backend (#6139 ) * support wint2 * update	2026-02-08 22:42:36 -08:00
0Ayachi0	8bb83b2239	[CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 (#6210 ) * [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-09 14:00:48 +08:00
xjkmfa	74762b0fb2	[ci case]Prompt logprobs precision (#6381 ) * Add ci case for min token and max token * 【CI case】include total_tokens in the last packet of completion interface stream output * [ci] prompt_logprobs precision case * [ci] prompt_logprobs precision case * [ci] prompt_logprobs precision case * [ci] prompt_logprobs precision case * [ci] prompt_logprobs precision case --------- Co-authored-by: xujing43 <xujing43@baidu.com>	2026-02-09 11:42:36 +08:00
周周周	2b4748de4f	[MTP] refactor MTP pre_process (#6358 )	2026-02-09 10:47:15 +08:00
MingkunZhang	15e01c6f61	[Metax][CI] add paddleocr ci test (#6379 )	2026-02-09 10:11:28 +08:00
Yonghua Li	5ac5ecd0b0	[BugFix] fix cache transfer tasks failure after cache cleared (#6202 ) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task * [fix] fix cache manager hang when clearing prefix cache * [fix] fix list_proxy has no clear method * [fix] fix barrier * [fix] add barrier0 * [fix] add cache_task_is_paused_signal * [fix] fix condition * [fix] fix cache transfer sync and delay prefix cache tree clearing * [fix] fix typo * [chore] polish code * [fix] revert only rank0 write kv_cache_status_signal * [fix] fix thread pool and prefix cache manager hang * [fix] add timeout for task_swapping_event * [fix] tolerate prefix cache manager error while prefix tree is cleared * [chore] add more log * [fix] fix test_prefix_cache_manager * [fix] fix prefix_cache_status_signal usage	2026-02-08 15:33:56 +08:00
chen	72fe94cb13	[Feature] support glm tp+dp+ep (#6317 )	2026-02-05 21:47:01 +08:00
CSWYF3634076	1c0a2b055f	[Feature] console print statistical metrics (#6339 ) * [Feature] console print statistical data * [Feature] console print statistical data v2 dp_rank * [Feature] console print statistical data v2 unittest * [Feature] console print statistical data v3 unittest	2026-02-05 19:20:36 +08:00
MingkunZhang	de02a909c8	[Metax][CI] restore 21b/28b ci test file (#6368 ) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-05 18:38:59 +08:00
MingkunZhang	6e28b5ef4f	[Metax][CI] update metax ci files (#6364 )	2026-02-05 17:16:31 +08:00
chen	29a313a402	[Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354 ) * support FA4 sm100 * flash attn backend support mask * flash attn backend run flashmask correct * add test for flash_attn_backend and flash_attn_func * check * add test for fa4 * requirements.txt add fa4 whl * check test on sm100 * fix CI conflict * add enable_torch_proxy for flash_mask * lazy import fa4 * check * fix tests import * check test_load_mpt import	2026-02-05 14:39:00 +08:00
YuBaoku	cae2709eff	[CI] Update stable test workflow and run.sh script (#6352 )	2026-02-05 11:01:15 +08:00
GoldPancake	183b8d325a	[RL] Support GLM MTP RL Model (#6267 )	2026-02-04 20:14:35 +08:00
luukunn	765df94e6c	[Optimization]update prompt & prompt_token_ids (#6334 ) * fix prompt * add unit test * add unit test * fix	2026-02-04 20:08:01 +08:00
JYChen	bf78a48eb3	[Others] add mock unittest for sm100 FP8 inference (#6273 ) * add unittest * use new file --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-04 17:39:15 +08:00
chenjian	90db0bdd0d	[Optimize] Optimize ttft for ep (#6098 ) * optimize ttft * fix * fix * fix ci * fix ci * fix * fix bug * fix * add comments * fix ci * fix	2026-02-04 15:03:29 +08:00
fxyfxy777	36547cfdb3	[Feature] FD_USE_PHI_FP8_QUANT (#6320 ) * add ut * add use_fd_quant env * rm mask_per_token_quant * add make ops list * USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true * modify comments * use bool type * Add function declaration	2026-02-03 22:33:03 -08:00
MingkunZhang	2ffcb3d9ed	[Metax][CI] update ci test files (#6340 )	2026-02-04 13:58:07 +08:00
周周周	6225439778	add PADDLE_ENFORCE (#6321 )	2026-02-04 10:47:19 +08:00
xunyoyo	8225e694c9	[CI]【Hackathon 10th Spring No.37】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_wint2_backend.py单测补充 (#6286 ) * Add wint2 MoE backend tests * Align wint2 test dtypes for cutlass apply * Use bfloat16 input in wint2 test * Stub moe_expert_reduce in wint2 test * Use 2 experts in wint2 test --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-04 10:46:26 +08:00
RAM	5b22e5dfe7	[RL] R3 Support Fused Put the Routing of All Layers (#6099 ) * fused put routing * fix bug * [draft commit]dynamic dtype * fix async put & numpy bug * fix unit8 test case	2026-02-03 04:13:16 -08:00
ddchenhao66	faade7d0ab	[BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled (#6309 )	2026-02-03 19:49:01 +08:00
kesmeey	73952a3b67	add tests (#6243 ) Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-03 17:02:36 +08:00
bukejiyu	12d4b4cb87	[Feature]Support reorder ids to split prefill and decodes (#5779 ) * support reorder ids * perfect code * fix * fix unittest * delete code * fix * add python api * delete custom op * update algorithm * fix swap * support condense * support condense * support mtp * delete code * update * update * update * update * update for other platfrom * update * fix * fix mtp * fix ut * update * fix ut * update ut * fix * fix encoder_cache * fix ci * fix * fix vl * Fix performance regression * fix * fix * fix mtp * fix index->req_id mapping * fix ut --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com> Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-03 00:28:02 -08:00
周周周	8277b95fa6	remove speculate_get_padding_offset op (#6308 )	2026-02-03 15:18:12 +08:00
ApplEOFDiscord	6563b8307c	[Bug Fix] fix tokenizer oom (#6287 ) * fix tokenizer oom * fix unit test	2026-02-03 11:27:11 +08:00
GoldPancake	fb374238e1	Revert "[RL] Support GLM MTP RL Model (#6223 )" (#6301 ) This reverts commit `af6c84d48d`.	2026-02-02 14:08:13 +08:00
fxyfxy777	2ada119a38	[Optimize] optimize mask_quant & swiglu (#6222 ) * optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant	2026-02-02 13:52:38 +08:00
xunyoyo	25656455ee	[CI] 【Hackathon 10th Spring No.38】功能模块 fastdeploy/entrypoints/openai/serving_completion.py单测补充 (#6227 ) * Add serving completion tests * test: tighten serving completion coverage	2026-02-02 12:53:04 +08:00
kesmeey	afee0b9c5e	[CI] 【Hackathon 10th Spring No.30】功能模块 fastdeploy/inter_communicator/engine_worker_queue.py单测补充 (#6102 ) * test: add comprehensive tests for EngineWorkerQueue to improve code coverage * style: format tests/inter_communicator/test_e2w_queue.py with black --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-01-30 21:37:29 +08:00
xunyoyo	18ebce9dec	[CI] 【Hackathon 10th Spring No.41】功能模块 fastdeploy/entrypoints/llm.py 单测补充 (#6108 ) * Add LLM entrypoint tests for coverage * test: streamline llm entrypoint coverage * test: format llm tests	2026-01-30 12:58:10 +08:00
JYChen	6c685c9474	Revert "[Feature] Support Ernie FP8 on sm100 (#5593 )" (#6275 ) This reverts commit `eb80724b71`.	2026-01-30 11:22:01 +08:00
chenjian	292bab7e6d	[BugFix] Fix bug for enable output caching (#6226 ) * [BugFix] Fix bug for enable output caching * fix * Fix * fix * fix ci	2026-01-30 10:55:36 +08:00
周周周	e237313797	[BugFix] allow return code 250 in tests/distributed/test_fusedmoe_ep_entry.py (#6269 )	2026-01-29 16:00:03 +08:00
yuxuan	44b52701f6	[Feature] Support NVFP4 MoE on SM100 (#6003 ) * fp4 dense * [WIP] support nvfp4, dense part * [wip] developing loading qwen model * loading * update * dense fp4 OK, cudagraph error * [WIP] moe forward part * with flashinfer-backend * qwen3_moe_fp4 * update * support flashinfer-cutlass moe, qwen3-moe-fp4 OK * support ernie4.5-fp4 * fix load error * add some ut * add docs * fix CLA, test * fix the apply() in ModelOptNvFp4FusedMoE * fix CodeStyle * del the PADDLE_COMPATIBLE_API * fix broken url: nvidia_gpu.md * fix docs * fix token_ids * fix CI in Hopper * move flashinfer imports inside the function * fix model_runner Removed the logic for generating random padding IDs. * Remove skip condition for CUDA version in nvfp4 test * add test for nvfp4 * fix according to review * Add Chinese translation link to NVFP4 documentation * del flashinfer.py * fix unittest --------- Co-authored-by: zoooo0820 <zoooo0820@qq.com> Co-authored-by: bukejiyu <395822456@qq.com>	2026-01-29 14:16:07 +08:00
JYChen	eb80724b71	[Feature] Support Ernie FP8 on sm100 (#5593 ) * Deepgemm暂时可用版本 * dense部分 e8m0 ok * EB模型E8M0跑通的版本 * code check * support 21b-tp2, dev_paddle * 单机4.5T ep OK的版本 * 修复删除的代码,单机4.5T ep(非cudagraph) * eb tp * Support SM100 block-wise FP8 inference * refine codes, support deepgemm on sm100 * add thirdparty PFCC/DeepGEMM * fix ep decode * 使用deepep ue8m0, 解决精度问题 * 修复FP8 TP精度 * Deepgemm升级适配Hopper逻辑 * add ue8m0 kernel * add ue8m0 kernel * fix custom_ops/gpu_ops/cpp_extensions.cc * eb 输出正常 * eb5 text is right * 目测精度一致 * 自测精度对齐 * 替换masked_per_token_quant, ep精度OK * 性能提升约30% * 暂时跑通ep但是有问题 * 自测一致 * rm test fun * fix ep event * 图优化算子更新Deepgemm * fix build * 暂时绕过deepgemm CI编译问题 * 根据SM区分deepgemm版本 * remove useless code --------- Co-authored-by: ckl117 <ckl117@163.com> Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”> Co-authored-by: fxyfxy777 <fxyfxy777@163.com>	2026-01-29 13:49:54 +08:00
GoldPancake	af6c84d48d	[RL] Support GLM MTP RL Model (#6223 ) * support glm mtp rl model * fix * fix * fix ut * update baseline	2026-01-28 08:28:03 -08:00
jc	7da5f54fb3	[CI] Add unit test for swap_layout && remove unit test of splitwise_scheduler (#6250 ) * Add unit test for swap_layout * remove splitwise_scheduler test	2026-01-28 19:20:20 +08:00
ddchenhao66	6d33d5e370	[Models][BugFix] shared experts and dense mlp layer do not require TP split (#6180 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2026-01-28 18:58:19 +08:00

... 2 3 4 5 6 ...

870 Commits