FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-08 16:32:41 +08:00

Author	SHA1	Message	Date
0Ayachi0	8bb83b2239	[CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 (#6210 ) * [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.25】功能模块 fastdeploy/inter_communicator/zmq_server.py 单测补充 --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-09 14:00:48 +08:00
Mattheliu	c776d483e4	[BugFix]fix handle 4 return values from noaux_tc_redundant op (#6384 ) * fix: handle 4 return values from noaux_tc_redundant op The noaux_tc_redundant CUDA op is defined with 4 outputs in PD_BUILD_STATIC_OP: - output_tensor (scores) - topk_values - topk_indices - tokens_per_expert_stats_list_out (inplace updated) The Python code was only unpacking 3 values, causing: ValueError: too many values to unpack (expected 3) This fix correctly unpacks all 4 return values, ignoring the inplace updated tensor which is the same as the input tokens_per_expert_stats_list. Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com> * fix: make noaux_tc_redundant return 4 values to match OP definition The PD_BUILD_STATIC_OP defines 4 outputs but the function only returned 3, causing inconsistent behavior across different Paddle framework versions. This fix explicitly returns 4 values: - scores (inplace modified) - topk_values - topk_indices - tokens_per_expert_stats_list (inplace modified via atomicAdd) Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com> --------- Co-authored-by: Claude (Claude Opus 4.5) <noreply@anthropic.com>	2026-02-09 13:17:47 +08:00
JYChen	9bcd863902	[Others] support import deepgemm/deepep from fleet ops (#6351 ) * update paddleformers to v1.0 * only change import fleetpath	2026-02-09 11:53:13 +08:00
xjkmfa	74762b0fb2	[ci case]Prompt logprobs precision (#6381 ) * Add ci case for min token and max token * 【CI case】include total_tokens in the last packet of completion interface stream output * [ci] prompt_logprobs precision case * [ci] prompt_logprobs precision case * [ci] prompt_logprobs precision case * [ci] prompt_logprobs precision case * [ci] prompt_logprobs precision case --------- Co-authored-by: xujing43 <xujing43@baidu.com>	2026-02-09 11:42:36 +08:00
周周周	2b4748de4f	[MTP] refactor MTP pre_process (#6358 )	2026-02-09 10:47:15 +08:00
Jiang-Jia-Jun	18e79dd660	[Metrics] Support cpu-cache-block-num (#6390 ) Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com>	2026-02-09 10:27:56 +08:00
MingkunZhang	15e01c6f61	[Metax][CI] add paddleocr ci test (#6379 )	2026-02-09 10:11:28 +08:00
Yonghua Li	5ac5ecd0b0	[BugFix] fix cache transfer tasks failure after cache cleared (#6202 ) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task * [fix] fix cache manager hang when clearing prefix cache * [fix] fix list_proxy has no clear method * [fix] fix barrier * [fix] add barrier0 * [fix] add cache_task_is_paused_signal * [fix] fix condition * [fix] fix cache transfer sync and delay prefix cache tree clearing * [fix] fix typo * [chore] polish code * [fix] revert only rank0 write kv_cache_status_signal * [fix] fix thread pool and prefix cache manager hang * [fix] add timeout for task_swapping_event * [fix] tolerate prefix cache manager error while prefix tree is cleared * [chore] add more log * [fix] fix test_prefix_cache_manager * [fix] fix prefix_cache_status_signal usage	2026-02-08 15:33:56 +08:00
jc	d6b3c722c1	[KVCache] Storage cache supports c8 model (#6298 ) * Refine cache transfer manager * Storage cache supports c8 model	2026-02-06 12:01:17 +08:00
chen	72fe94cb13	[Feature] support glm tp+dp+ep (#6317 )	2026-02-05 21:47:01 +08:00
CSWYF3634076	1c0a2b055f	[Feature] console print statistical metrics (#6339 ) * [Feature] console print statistical data * [Feature] console print statistical data v2 dp_rank * [Feature] console print statistical data v2 unittest * [Feature] console print statistical data v3 unittest	2026-02-05 19:20:36 +08:00
MingkunZhang	de02a909c8	[Metax][CI] restore 21b/28b ci test file (#6368 ) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-05 18:38:59 +08:00
YuBaoku	5c9bc13a59	[CI] Fix check-bypass.yml	2026-02-05 18:06:39 +08:00
MingkunZhang	6e28b5ef4f	[Metax][CI] update metax ci files (#6364 )	2026-02-05 17:16:31 +08:00
周周周	e3fb8796b4	Remove MTP rebuil_padding useless code (#6336 )	2026-02-05 16:28:44 +08:00
YuBaoku	2d3fb81d29	[CI] Update check-bypass.yml (#6360 )	2026-02-05 15:52:30 +08:00
K11OntheBoat	116e2aea7a	Support Norm before Rope (#6332 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2026-02-05 15:28:52 +08:00
chen	29a313a402	[Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354 ) * support FA4 sm100 * flash attn backend support mask * flash attn backend run flashmask correct * add test for flash_attn_backend and flash_attn_func * check * add test for fa4 * requirements.txt add fa4 whl * check test on sm100 * fix CI conflict * add enable_torch_proxy for flash_mask * lazy import fa4 * check * fix tests import * check test_load_mpt import	2026-02-05 14:39:00 +08:00
lizan1999	72edd394d9	[XPU] support noaux_tc (#6326 )	2026-02-05 12:04:16 +08:00
YuBaoku	cae2709eff	[CI] Update stable test workflow and run.sh script (#6352 )	2026-02-05 11:01:15 +08:00
GoldPancake	183b8d325a	[RL] Support GLM MTP RL Model (#6267 )	2026-02-04 20:14:35 +08:00
luukunn	765df94e6c	[Optimization]update prompt & prompt_token_ids (#6334 ) * fix prompt * add unit test * add unit test * fix	2026-02-04 20:08:01 +08:00
JYChen	bf78a48eb3	[Others] add mock unittest for sm100 FP8 inference (#6273 ) * add unittest * use new file --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-04 17:39:15 +08:00
sunxin	ef47e6eb46	[Others]skip to_tensor (#6342 )	2026-02-04 17:25:19 +08:00
Zhang Yulong	26ba019e66	Update README.md (#6343 )	2026-02-04 15:57:18 +08:00
MingkunZhang	43e3886ef9	[Metax][CI] fix run_ci_metax.sh error (#6341 )	2026-02-04 15:43:48 +08:00
MingkunZhang	e109fb9a0e	[Metax][Fix] fix issues based #6259 (#6338 )	2026-02-03 23:21:35 -08:00
chenjian	90db0bdd0d	[Optimize] Optimize ttft for ep (#6098 ) * optimize ttft * fix * fix * fix ci * fix ci * fix * fix bug * fix * add comments * fix ci * fix	2026-02-04 15:03:29 +08:00
mouxin	6e96bd0bd2	[Feature] Fix counter release logic & update go-router download URL (#6280 ) * [Doc] Update prerequisites in the documentation * [Feature] Enhance Router with /v1/completions, docs, scripts, and version info * [Feature] Enhance Router with /v1/completions, docs, scripts, and version info * [Feature] Enhance Router with /v1/completions, docs, scripts, and version info * [Feature] Fix counter release logic * [Feature] Update go-router download URL * [Feature] Update go-router download URL * [Feature] Update go-router download URL * [Feature] Update go-router download URL * [Feature] Update token counter logic and docs * [Feature] Update token counter logic and docs --------- Co-authored-by: mouxin <mouxin@baidu.com>	2026-02-04 15:02:38 +08:00
fxyfxy777	36547cfdb3	[Feature] FD_USE_PHI_FP8_QUANT (#6320 ) * add ut * add use_fd_quant env * rm mask_per_token_quant * add make ops list * USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true * modify comments * use bool type * Add function declaration	2026-02-03 22:33:03 -08:00
MingkunZhang	2ffcb3d9ed	[Metax][CI] update ci test files (#6340 )	2026-02-04 13:58:07 +08:00
sunxin	9b0a82cfa9	[Model Runner] Support overlap schedule (#6259 )	2026-02-04 10:49:44 +08:00
周周周	6225439778	add PADDLE_ENFORCE (#6321 )	2026-02-04 10:47:19 +08:00
xunyoyo	8225e694c9	[CI]【Hackathon 10th Spring No.37】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_wint2_backend.py单测补充 (#6286 ) * Add wint2 MoE backend tests * Align wint2 test dtypes for cutlass apply * Use bfloat16 input in wint2 test * Stub moe_expert_reduce in wint2 test * Use 2 experts in wint2 test --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-04 10:46:26 +08:00
Zhang Yulong	16d03c3127	update (#6335 )	2026-02-03 21:59:32 +08:00
Jiang-Jia-Jun	793dac0f9d	Modify Nightly Build installation commands for fastdeploy Update the installation instructions for the Nightly Build of fastdeploy to use the cu126 index for both SM86/89 and SM80/90 architectures.	2026-02-03 20:24:27 +08:00
Jiang-Jia-Jun	829139a5e5	Fix Nightly build installation URLs for fastdeploy-gpu Updated installation instructions for the latest Nightly build of fastdeploy-gpu to use the correct URLs for CUDA 12.6.	2026-02-03 20:24:19 +08:00
RAM	5b22e5dfe7	[RL] R3 Support Fused Put the Routing of All Layers (#6099 ) * fused put routing * fix bug * [draft commit]dynamic dtype * fix async put & numpy bug * fix unit8 test case	2026-02-03 04:13:16 -08:00
CSWYF3634076	722ca87db6	[Others] lazy write log when writing (#6323 )	2026-02-03 20:11:13 +08:00
xiegegege	51c6fa8afc	[CE]add 21b cpu cache ,glm mtp,glm for rl config (#6328 )	2026-02-03 20:10:47 +08:00
ddchenhao66	faade7d0ab	[BugFix] Fix port-releated errors in mix mode when FD_ENABLE_INTERNAL_ADAPTER is enabled (#6309 )	2026-02-03 19:49:01 +08:00
JYChen	c745a22420	[Feature] Support Ernie FP8 on sm100 ( the fixed version) (#6304 )	2026-02-03 17:47:38 +08:00
kesmeey	73952a3b67	add tests (#6243 ) Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-03 17:02:36 +08:00
bukejiyu	12d4b4cb87	[Feature]Support reorder ids to split prefill and decodes (#5779 ) * support reorder ids * perfect code * fix * fix unittest * delete code * fix * add python api * delete custom op * update algorithm * fix swap * support condense * support condense * support mtp * delete code * update * update * update * update * update for other platfrom * update * fix * fix mtp * fix ut * update * fix ut * update ut * fix * fix encoder_cache * fix ci * fix * fix vl * Fix performance regression * fix * fix * fix mtp * fix index->req_id mapping * fix ut --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com> Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-03 00:28:02 -08:00
周周周	cbdb2462ea	cp 1131 tbo to develop (#6281 )	2026-02-03 15:23:23 +08:00
周周周	8277b95fa6	remove speculate_get_padding_offset op (#6308 )	2026-02-03 15:18:12 +08:00
Moonchild1227	39dc4b0c2e	[Feature] [KVCache] support file_store kv cache backend (#6188 ) * fix(examples): comment out stop.sh to avoid error when script is missing * feat: add file_store support for cache manager * [fix] fix multi gpu transfer * [fix] fix global kvcache transfer * [Feature] [KVCache] support file_store kv cache backend * chore: update FileStore according to PR comments * fix: remove comments * fix: add swap_cache_layout for file store * fix: remove rank key * fix: Switch KV cache storage to pure file mode * Temporarily disable support for Tensor types * fix: remove args --kvcache_file_path & add envs FILE_BACKEND_STORAGE_DIR * fixx: Simplify cache_transfer_manager.py * fix: fix syntax bug * fix: Simplify file_store.py * fix: Use the key directly as the filename * fix: Simplify set() * fix: Simplify cache_transfer_manager.py & file_store.py * fix: Only support load to cpu buffer * feat: add FileStore backend for cache transfer * fix: guard zmq import	2026-02-03 14:37:58 +08:00
zccjjj	ee77ff9ebe	[config] fix assert message (#6310 )	2026-02-03 14:37:46 +08:00
Jingfeng Wu	4760835789	Fix heartbeat signal's sleeptime error (#6241 )	2026-02-03 14:28:51 +08:00
xjkmfa	e27a7cc5b0	[Benchmark] Ce qwen3 vl (#6288 ) * [CE]qwen3-vl	2026-02-03 14:17:28 +08:00

1 2 3 4 5 ...

4590 Commits