FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-06 23:49:39 +08:00

Author	SHA1	Message	Date
CSWYF3634076	08c411518f	[Loader] support dummy load weight (#6169 ) * [Loader] support dummy load weight * [Loader] support dummy load weight v2 * [Loader] support dummy load weight unittest * [Loader] support dummy load weight unittest v2 * [Loader] support dummy load weight v3 docs and fp8	2026-01-26 13:58:53 +08:00
sunxin	adc69c15d0	[Model Runner] Prepare token count and move FA3 initialization into the graph (#6170 ) * prepare for token num and put FA3 init in graph	2026-01-26 12:16:57 +08:00
Yuanle Liu	253c5cc16c	Improve deep_ep import handling with logging (#6207 ) * Improve deep_ep import handling with logging Refactor deep_ep import logic to handle PaddleFleet and PFCCLab imports with error logging. * Add traceback import to ep.py	2026-01-24 22:41:42 -08:00
fxyfxy777	79f42209bf	add scale_wrapper for per_block_cast_to_fp8 (#6183 )	2026-01-23 00:37:20 -08:00
GoldPancake	646aced1eb	[UT] Add GLM E2E tests for non-MTP and MTP (#6163 ) * add glm ut	2026-01-23 10:34:29 +08:00
Haonan Luo	82057cb71f	Support MXFP4 for GPT-OSS (#5435 ) * support mxfp4 in gpt-oss * support mxfp4 in gpt-oss * add scope for flashinfer * remove torch code * update envs.FD_MXFP4_BACKEND * update process_weights_after_loading * update env name * support tp in gpt-oss, add e2e test * add flashinfer-python-paddle in requirements * fix import error * add test * add test * add test * add test	2026-01-22 14:21:01 +08:00
fxyfxy777	9c4db0ac3f	[BugFix] fix weight quant op (#6137 ) * fix weight quant * fix weight quant * bit equal * code style	2026-01-22 09:50:57 +08:00
zccjjj	14a64e9b3b	[XPU] change XPU EP interface from xDeepEP to paddle (#5706 ) * add ENV VAR to controll low lantency buffer	2026-01-21 18:23:45 +08:00
K11OntheBoat	490a6551dc	rename params of normalization layer (#6133 ) Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-01-21 17:18:35 +08:00
yzwu	837ddca273	[Iluvartar][CI] Fix the error max_tokens_per_expert referenced before assignment (#6083 )	2026-01-21 16:01:29 +08:00
lizexu123	f4902fe42d	[BugFix] fix wint2 (#6109 ) * fix * fix * fix	2026-01-20 21:46:21 +08:00
yinwei	51a8a2ed57	[XPU] Support CudaGraph(add block attn cuda_graph support) (#6116 ) * add block attn cuda_graph support	2026-01-20 19:33:11 +08:00
Ryan	dda27e50f5	[Graph Optimization] remove static_op_get_block_shape_and_split_kv_block from cudagraph (#6081 ) * rm static_op_get_block_shape_and_split_kv_block from cudagraph * update max_capture_shape * fallback: zeros -> empty to avoid coverage check * check graph_opt_config exists * add max_capture_shape_dy2st && full_cuda_graph: false -> true in 28B vl test * add use_cudagraph flag to control step_use_cudagraph	2026-01-20 14:05:18 +08:00
zhupengyang	45ebb2efb4	[XPU] support plugin model (#6092 )	2026-01-20 13:00:09 +08:00
jackyYang6	988e0bc338	[Feature] Add PaddleFormers fallback backend (#5999 ) * feat(paddleformers): add dense text model fallback backend * docs(paddleformers): add user guide and fix code review issues * add fallback unit test * precommit format * fix pre-commit * fix: address code review feedback * docs: add PaddleFormers backend documentation (EN) and simplify installation --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-19 21:50:50 +08:00
sunxin	9dc1c74d36	fix opt qknorm (#6080 )	2026-01-19 12:07:20 +08:00
fxyfxy777	4c92035f2d	[Feature] Unify fp8 block_wise quant ops (#5991 ) * quant stash * blockwise_quant * precommit * rm tensor.cut * tp ok * add swiglu * rm outdate code * fix activate ut * change baseline * fix baseline error	2026-01-15 05:50:37 -08:00
freeliuzc	49617d9832	[Feature]Support tag phase token enforce generation (#6034 ) * support tag phase token enforce generation * optimize note and some feature * fix sampler unit test --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-15 03:59:55 -08:00
lizexu123	6619298b50	【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 ) * update w4afp8 * build.sh ok * support cuda_graph * fix * add test * fix max_tokens_per_expert * >=70 * fix * compute_max_tokens_from_prefix_sum in w4afp8 * compute_max_tokens use cub	2026-01-15 19:18:42 +08:00
Cheng Yanfei	fbcccaa750	[Intel HPU] enable MoE EP for hpu (#5855 ) * enable HPU MoE EP * MoE intermediate_scale stack * enable loader_v1 esp for tensor_wise_fp8 TP or EP * modify activation_scale name	2026-01-15 13:08:00 +08:00
zhupengyang	24ffa7c991	[XPU] fix moe num_expert (#6014 )	2026-01-15 10:49:36 +08:00
RAM	b3f59fd9b5	[RL][CI] Support Async R3 And Add Accuracy Test (#5937 ) * add bs1 r3 test case * async put * r3 test case 1.0 * success run eb5 * refine test case * pre-commit * add eb45 & glm testcase * format code * add p2pstore requirements * support only last turn * R3 use worker log * refine code &fix ci bug * refine error mesg * fix empty input bug * Success set acc ci of eb45 and glm45 * refine code * fix bug	2026-01-14 04:25:06 -08:00
Ryan	0d1a5e70bc	[Graph Optimization] Add `full_cuda_graph` to control subgraph split (#6027 )	2026-01-14 11:43:59 +08:00
sunxin	2533836dbb	[Optimization] Accelerate Qwen3 QK RMSNorm via Fused Triton Kernel (#5880 ) * qk rmsnorm fused * inplace * glm * fix * add qknorm layer * fix * update * fix qwen3 vl * update rl baseline * fix qwen3 vl moe * test * fix qwen vl moe rl * fix	2026-01-12 05:10:21 -08:00
lzy	223b2f5d86	Support setting communication groups in custom_allreduce and the all-to-all\transpose fused operator during the decoding phase. (#5917 )	2026-01-12 14:09:39 +08:00
周周周	b8d9daa785	MLA clean code (#5979 )	2026-01-10 21:05:00 +08:00
xiaoxiaohehe001	00a01ae024	[Feature] Support redundant expert for eplb (#5918 ) * [BugFix] support redundant expert for eplb * support redundant expert for eplb * support redundant expert for eplb * update * fix ci eplb	2026-01-09 17:13:24 +08:00
zccjjj	20de04e249	[XPU] move xpu_attn_backend.py to FastDeploy/fastdeploy/model_executor/layers/backends/xpu (#5878 )	2026-01-09 16:34:57 +08:00
Yuanle Liu	d4a386dfc4	Revert "Revert "[TSP] last_norm allgather move to model.py (#5924 )" (#5961 )" (#5972 ) This reverts commit `8c3513a410`.	2026-01-09 15:58:22 +08:00
Yuanle Liu	8c3513a410	Revert "[TSP] last_norm allgather move to model.py (#5924 )" (#5961 ) This reverts commit `2bb838fed9`.	2026-01-09 15:20:40 +08:00
xiaoluomi	2bb838fed9	[TSP] last_norm allgather move to model.py (#5924 ) * support_lastnorm_gather_split_dev * support_lastnorm_gather_split_dev1 * support_lastnorm_gather_split_dev3 * support_lastnorm_gather_split_dev4 * support_lastnorm_gather_split_dev5	2026-01-07 23:36:33 -08:00
GoldPancake	a1fc4e249e	[Bugfix] Fix mtp logprob hang problem when include stop_seq (#5927 ) * fix mtp logprob hang when include stop_seq	2026-01-08 14:21:24 +08:00
lizhenyun01	2be8656c29	[BugFix] fix mtp split kv attetion (#5920 ) * [BugFix] fix mtp split kv attetion * clean code * clean code	2026-01-07 04:07:31 -08:00
Ryan	3e74bacc5e	add m_grouped_gemm_fp8_fp8_bf16_nt_contiguous_custom_python_op (#5847 )	2026-01-07 16:17:55 +08:00
fmiao2372	1ee285c2d6	[Intel HPU] enable chunked prefill (#5903 ) * [Intel HPU] enable chunked prefill * fix bug by copilot comments	2026-01-06 21:01:50 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
Neil Zhu	272a371635	[Metax] optimize flash attention backend (#5876 )	2026-01-06 09:52:09 +08:00
lizexu123	1d3ae7c024	[BugFix] fix w4afp8 tp=8 (#5868 ) * fix w4afp8 tp=8 * fix	2026-01-05 18:59:02 +08:00
ming1753	f50e1bcc16	[Others] enable use PFCC deep_ep (#5822 ) * upstream deep_ep * fix bug * fix bug * modify env name	2026-01-05 02:07:01 -08:00
周周周	dc13344ab8	[Optimization] add del to decrease peak memory in MoE prefill (#5863 )	2026-01-05 14:01:48 +08:00
chen	193886e745	only cuda run triton op (#5846 )	2025-12-31 14:17:31 +08:00
GoldPancake	4e10ae5d99	[Speculative Decoding] Optimize draft logprob (#5842 ) * optimize draft logprob * fix ut	2025-12-31 13:35:56 +08:00
chen	0bcf924e10	[Optimization] Optimization for gather_logprob by 10GB (#5817 ) * opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k	2025-12-30 15:33:34 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
Ryan	eb782a0225	[BugFix] Fix return value inconsistency for `ep_moe_expert_combine` op (#5812 )	2025-12-29 16:44:00 +08:00
周周周	03363cab4c	make flash_mask attention pybind (#5783 )	2025-12-26 14:31:35 +08:00
Nyakku Shigure	11227e00bb	[GraphOptimization] Wrap deep gemm and triton as python op (#5673 ) * [GraphOptimization] Wrap deep gemm and triton as python op * add unitest to _base_test && compatibility * paddle.static.MetaTensor -> "paddle.static.MetaTensor" * mv register_custom_python_op * rename yaml --------- Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com>	2025-12-24 15:23:46 +08:00
GoldPancake	23d488c488	[Feature] Entropy calculation support (#5692 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * support entropy * fix bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-23 21:19:47 +08:00
bukejiyu	d1c6e57341	[Others] upgrade paddleformer to 0.4.0 (#5599 )	2025-12-23 05:08:01 -08:00

1 2 3 4 5 ...

398 Commits