FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-10 09:31:48 +08:00

Author	SHA1	Message	Date
chen	29a313a402	[Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354 ) * support FA4 sm100 * flash attn backend support mask * flash attn backend run flashmask correct * add test for flash_attn_backend and flash_attn_func * check * add test for fa4 * requirements.txt add fa4 whl * check test on sm100 * fix CI conflict * add enable_torch_proxy for flash_mask * lazy import fa4 * check * fix tests import * check test_load_mpt import	2026-02-05 14:39:00 +08:00
fxyfxy777	36547cfdb3	[Feature] FD_USE_PHI_FP8_QUANT (#6320 ) * add ut * add use_fd_quant env * rm mask_per_token_quant * add make ops list * USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true * modify comments * use bool type * Add function declaration	2026-02-03 22:33:03 -08:00
周周周	6225439778	add PADDLE_ENFORCE (#6321 )	2026-02-04 10:47:19 +08:00
周周周	8277b95fa6	remove speculate_get_padding_offset op (#6308 )	2026-02-03 15:18:12 +08:00
fxyfxy777	2ada119a38	[Optimize] optimize mask_quant & swiglu (#6222 ) * optimize mask_quant op speed up 1.5 * fix calculate sequence * add fused * rm log * push kernel code * add ut * accuracy ok * add ue8m0 * add ut * add merge develop * rm ut of mask_per_token_quant	2026-02-02 13:52:38 +08:00
JYChen	6c685c9474	Revert "[Feature] Support Ernie FP8 on sm100 (#5593 )" (#6275 ) This reverts commit `eb80724b71`.	2026-01-30 11:22:01 +08:00
JYChen	eb80724b71	[Feature] Support Ernie FP8 on sm100 (#5593 ) * Deepgemm暂时可用版本 * dense部分 e8m0 ok * EB模型E8M0跑通的版本 * code check * support 21b-tp2, dev_paddle * 单机4.5T ep OK的版本 * 修复删除的代码,单机4.5T ep(非cudagraph) * eb tp * Support SM100 block-wise FP8 inference * refine codes, support deepgemm on sm100 * add thirdparty PFCC/DeepGEMM * fix ep decode * 使用deepep ue8m0, 解决精度问题 * 修复FP8 TP精度 * Deepgemm升级适配Hopper逻辑 * add ue8m0 kernel * add ue8m0 kernel * fix custom_ops/gpu_ops/cpp_extensions.cc * eb 输出正常 * eb5 text is right * 目测精度一致 * 自测精度对齐 * 替换masked_per_token_quant, ep精度OK * 性能提升约30% * 暂时跑通ep但是有问题 * 自测一致 * rm test fun * fix ep event * 图优化算子更新Deepgemm * fix build * 暂时绕过deepgemm CI编译问题 * 根据SM区分deepgemm版本 * remove useless code --------- Co-authored-by: ckl117 <ckl117@163.com> Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”> Co-authored-by: fxyfxy777 <fxyfxy777@163.com>	2026-01-29 13:49:54 +08:00
jc	7da5f54fb3	[CI] Add unit test for swap_layout && remove unit test of splitwise_scheduler (#6250 ) * Add unit test for swap_layout * remove splitwise_scheduler test	2026-01-28 19:20:20 +08:00
GoldPancake	7d6c87c29e	[Others] Support constrained decoding when enable_thinking is false (#6248 ) * support constrained decoding when enable_thinking is false * fix * fix * fix	2026-01-28 00:05:17 -08:00
sunxin	27f8799f04	[Model Runner] Refactor execute_model for GPU async scheduling (#6176 )	2026-01-28 14:19:33 +08:00
freeliuzc	ce06c6dfb3	[BugFix] Fix token_penalty kernel (#6069 ) * fix token_penalty kernel * try to fix xpu * fix xpu * fix unit test	2026-01-28 12:03:05 +08:00
周周周	aa57864c5b	remove unneeded para from flash_mask_attention (#6218 )	2026-01-27 14:04:27 +08:00
周周周	0966df78dc	[Others] remove stop_nums (#6182 )	2026-01-26 12:12:47 +08:00
lizexu123	f4902fe42d	[BugFix] fix wint2 (#6109 ) * fix * fix * fix	2026-01-20 21:46:21 +08:00
fxyfxy777	4c92035f2d	[Feature] Unify fp8 block_wise quant ops (#5991 ) * quant stash * blockwise_quant * precommit * rm tensor.cut * tp ok * add swiglu * rm outdate code * fix activate ut * change baseline * fix baseline error	2026-01-15 05:50:37 -08:00
freeliuzc	49617d9832	[Feature]Support tag phase token enforce generation (#6034 ) * support tag phase token enforce generation * optimize note and some feature * fix sampler unit test --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-15 03:59:55 -08:00
freeliuzc	17866c028e	add more cases for attention unit test (#5931 )	2026-01-15 19:52:35 +08:00
sunxin	2533836dbb	[Optimization] Accelerate Qwen3 QK RMSNorm via Fused Triton Kernel (#5880 ) * qk rmsnorm fused * inplace * glm * fix * add qknorm layer * fix * update * fix qwen3 vl * update rl baseline * fix qwen3 vl moe * test * fix qwen vl moe rl * fix	2026-01-12 05:10:21 -08:00
freeliuzc	9018ccf74e	[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 ) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register	2025-12-25 01:54:59 -08:00
lizexu123	6d323769dd	fix w4afp8 (#5634 )	2025-12-22 13:39:41 +08:00
Yuanle Liu	cdc0004894	Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563 )" (#5611 ) This reverts commit `73e1d6aa90`.	2025-12-17 13:59:06 +08:00
Yuanle Liu	867803ae10	[BugFix] fix speculate_limit_thinking_content_length (#5590 ) * fix speculate_limit_thinking_content_length * update	2025-12-16 04:31:45 -08:00
fxyfxy777	73e1d6aa90	[Feature] add ue8m0 for per_token_quant_fp8 (#5563 ) * ue8m0 * add default arg --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-16 18:40:12 +08:00
lizexu123	95eab9f9ee	[Feature] support stop_token_ids (#5399 ) * support stop_token_ids * fix * delete chinese * support both * delete print	2025-12-09 17:49:12 +08:00
K11OntheBoat	8d99bac532	Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-12-09 14:17:30 +08:00
周周周	2aea8a3a60	[Others] Remove useless code (#5404 )	2025-12-08 13:59:46 +08:00
lizhenyun01	aba4fc657f	[Feature] support flash_mask_attention backend (#5134 ) * [Feature] suppert flash_mask_attention backend * fix unittest * clean code	2025-11-28 10:12:16 +08:00
xiaoxiaohehe001	6ca2651995	[Feature] Support noaux for eplb (#5143 ) * support noaux eplb * noaux_eplb * noaux_eplb * noaux_eplb	2025-11-21 14:10:32 +08:00
周周周	6fa34102e8	[Others]get_block_shape_and_split_kv_block clean code (#5123 )	2025-11-20 16:40:04 +08:00
lizhenyun01	d11235333e	format flash_mask_attn	2025-11-18 17:18:12 +08:00
yangjianfengo1	ae7bee8122	【New Feature】W4afp8 supports per group quantization (#4987 ) * w4afp8 支持per group * code style * fix transpose * revert fast hardmard --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com> Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>	2025-11-13 19:17:27 +08:00
Echo-Nie	ff653503ff	[Docs] Add License in Unittest (#4957 ) * add copyright * add CopyRight	2025-11-12 10:44:09 +08:00
ming1753	cba185f1fe	[Feature] Optim PaddleOCR-VL (#4873 ) * [Feature] Optim PaddleOCR-VL * fix bug	2025-11-07 14:56:44 +08:00
YuBaoku	819b2dbbae	Revert "【New Feature】W4afp8 supports per group quantization (#4272 )" (#4854 ) This reverts commit `93fcf7e4ec`.	2025-11-06 17:48:28 +08:00
yangjianfengo1	93fcf7e4ec	【New Feature】W4afp8 supports per group quantization (#4272 ) * w4afp8 支持per group * code style * 精度完成 * revert append attn utils * ffn1 动态量化 * ffn2 支持动态量化 * code style * code style * 修改单测 * 修改单测 * fix bug * Implement conditional parameter creation for layers Add parameter creation for up_gate_proj_in_scale when ep_size > 1. * code style * fix conflict * code style * code style * 修复w4aint8 精度 * fix ci --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-11-05 21:00:23 +08:00
周周周	937eb3c6ed	[get_padding_offset.] clean get_padding_offset.cu (#4777 ) [get_padding_offset.] clean get_padding_offset.cu (#4777)	2025-11-05 10:47:40 +08:00
freeliuzc	11398790d3	[Speculative Decoding][MTP]Support attn mask offset (#4641 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [MTP]Merge support attn (#4591) * support mask_offset in speculate decoding * fix dummpy run output * add unit test * fix unit test import * support attn_mask_offset in mtp mode * add update_attn_mask op * fix unit test && fix code-style	2025-11-03 10:08:01 +08:00
Yuanle Liu	b301bd6c31	[BugFix] fix thinking bug (#4710 ) * fix thinking bug * fix ut * update * fix	2025-10-31 22:00:31 +08:00
GoldPancake	1f3ce65b58	[Feature] support mtp distribution equivalence verification (#4699 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-10-31 11:45:04 +08:00
GoldPancake	fddda50cb9	Add ut for speculative sampler (#4650 )	2025-10-30 10:37:49 +08:00
Copilot	175391389f	Add comprehensive unit tests for limit_thinking_content_length operators (#4510 ) * Initial plan * Add comprehensive unit tests for limit_thinking_content_length functions Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix (#4514) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-10-21 18:55:57 +08:00
Yuanle Liu	cef3164c3b	Optimizing the performance of think length limit using custom operators (#4279 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * delete impl * delete min_length&max_length * support limit thinking content strategy * fix * fix * fix * update * fix set_value_by_flags_and_idx * fix * fix * fix * fix * update * fix * fix * fix typo * fix ci * fix * fix * support mtp * fix * fix * update * update	2025-10-20 21:09:13 +08:00
GoldPancake	47595a2480	[Feature] support mtp logprob (#4464 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support mtp logprob * fix unitest	2025-10-20 15:18:12 +08:00
Haonan Luo	1b9f351d21	Support GPT-OSS-BF16 (#4240 ) * [Feature] AppendAtten support sinks & HEAD_DIM=64 * fix bug * fix bug * fix bug * fix bug * [Feature] support gpt-oss * fix bug * add mask * support-gpt-oss * support-gpt-oss * fix long seq * support wint8 * support wint8 * support wint8 * update test * change sliding windows init pos --------- Co-authored-by: ming1753 <ideaminghp@163.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>	2025-10-20 14:44:58 +08:00
yzwu	4b661512ca	[Iluvatar GPU] Adapt VL model (#4313 )	2025-10-17 16:13:38 +08:00
freeliuzc	744287e1a9	fix param (#4419 )	2025-10-15 18:44:24 +08:00
freeliuzc	582aebd48b	[MTP]support mtp chunk_prefill_v1 (#4366 ) * support mtp chunk_prefill_v1 * fix mtp chunkprefill output, fix unit test * fix unit test * fix save_output	2025-10-15 13:21:32 +08:00
co63oc	73c8e0849f	【Hackathon 9th No.67】add speculate_verify (#4326 ) * add speculate_verify * fix * fix	2025-10-14 14:13:17 +08:00
Sunny-bot1	a751d977bc	[Optimization] Fuse get_max_len and get_kv_max_len (#4369 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * opt split_q_block * fuse max_lens and max kv_len	2025-10-13 20:35:00 +08:00
ooo oo	2d641078c3	【Hackathon 9th No.20】add unit tests for masked_per_token_quant (#4111 ) * test: add unit tests for masked_per_token_quant * apply review	2025-10-13 14:51:11 +08:00

1 2

94 Commits