FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
lizexu123	6619298b50	【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 ) * update w4afp8 * build.sh ok * support cuda_graph * fix * add test * fix max_tokens_per_expert * >=70 * fix * compute_max_tokens_from_prefix_sum in w4afp8 * compute_max_tokens use cub	2026-01-15 19:18:42 +08:00
yangjianfengo1	16e1992eba	[Bugfix] Increase the shape of w4afp8 gemm (#5957 ) * 增加w4afp8 shape * 增加w4afp8 shape * code style	2026-01-09 14:11:17 +08:00
yangjianfengo1	59523b27de	opt w4afp8 (#5853 )	2026-01-07 12:22:35 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
lizexu123	6d323769dd	fix w4afp8 (#5634 )	2025-12-22 13:39:41 +08:00
Sunny-bot1	3629db4129	[Quantization] Support w4afp8 MoE dynamic quantization (#5282 ) * support dynamic activation quant for w4afp8 * support dynamic w4afp8 * add test * fix * fix --------- Co-authored-by: zhoutianzi666 <17801055074@163.com>	2025-12-02 18:56:16 +08:00
yangjianfengo1	3afb717995	【Fix】fix deepep dispatch (#5036 ) * fix dispatch * fix dispatch --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-11-17 10:34:01 +08:00
yangjianfengo1	ae7bee8122	【New Feature】W4afp8 supports per group quantization (#4987 ) * w4afp8 支持per group * code style * fix transpose * revert fast hardmard --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com> Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>	2025-11-13 19:17:27 +08:00
YuBaoku	819b2dbbae	Revert "【New Feature】W4afp8 supports per group quantization (#4272 )" (#4854 ) This reverts commit `93fcf7e4ec`.	2025-11-06 17:48:28 +08:00
yangjianfengo1	93fcf7e4ec	【New Feature】W4afp8 supports per group quantization (#4272 ) * w4afp8 支持per group * code style * 精度完成 * revert append attn utils * ffn1 动态量化 * ffn2 支持动态量化 * code style * code style * 修改单测 * 修改单测 * fix bug * Implement conditional parameter creation for layers Add parameter creation for up_gate_proj_in_scale when ep_size > 1. * code style * fix conflict * code style * code style * 修复w4aint8 精度 * fix ci --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-11-05 21:00:23 +08:00
Zhenghai Zhang	1712e1351b	【Hackathon 9th No.86】autogen `MoeFastHardamardImplWrapper` template_instantiation (#4592 ) * autogen MoeFastHardamardImplWrapper template_instantiation * fix codestyle * fix codestyle * add impl cu files	2025-10-30 10:28:36 +08:00
yangjianfengo1	8e1b35a09b	【Fix bug] w4afp8 的nblock固定为256，并且fa3的append attn 增加mask参数 (#3771 ) * fix w4afp8 * 增加集中式配置 * codestyle * fix fa3 append attn	2025-09-02 19:17:01 +08:00
Yuan Xiaolan	c71ee0831c	add w4afp8 offline script (#3636 )	2025-08-29 17:56:05 +08:00
Yuan Xiaolan	9205c88da1	support w4afp8 EP inference (#3044 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-08-25 11:27:45 +08:00
yangjianfengo1	e5aa7087db	【bug fix】修复w4a8编译慢 (#3510 ) * 修复w4a8编译 * code style * 修复tma copy	2025-08-21 18:50:14 +08:00
yangjianfengo1	b047681c5d	【New Feature】支持Fp8 group Gemm 24稀疏 (#3463 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details * 支持24稀疏 * code style * 增加stmatrix 宏定义判断 * code style	2025-08-19 02:54:47 -07:00
yangjianfengo1	89397516a8	[New Feature] Support W4Afp8 MoE GroupGemm (#3171 ) * init * 增加多线程编译 * fix bug * fix bug * code style * 增加fp16 * 将print替换成assert * 修复stmatrix * 减小单测shape * 减小单测shape	2025-08-06 10:34:05 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00

20 Commits