FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 17:11:21 +08:00

Author	SHA1	Message	Date
Haonan Luo	82057cb71f	Support MXFP4 for GPT-OSS (#5435 ) * support mxfp4 in gpt-oss * support mxfp4 in gpt-oss * add scope for flashinfer * remove torch code * update envs.FD_MXFP4_BACKEND * update process_weights_after_loading * update env name * support tp in gpt-oss, add e2e test * add flashinfer-python-paddle in requirements * fix import error * add test * add test * add test * add test	2026-01-22 14:21:01 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
Sunny-bot1	3629db4129	[Quantization] Support w4afp8 MoE dynamic quantization (#5282 ) * support dynamic activation quant for w4afp8 * support dynamic w4afp8 * add test * fix * fix --------- Co-authored-by: zhoutianzi666 <17801055074@163.com>	2025-12-02 18:56:16 +08:00
xiaoxiaohehe001	e150a418d4	support moe offline quant (#5142 )	2025-11-24 18:59:18 +08:00
chen	3161014e49	[BugFix]fix v1 loader moe bf16, and supoort dynamic_load_weight create quant param (#4229 ) * fix v1 loader moe bf16, and supoort dynamic_load_weight create quant param * include_stop_str_in_output=False not return eos text	2025-09-24 14:12:05 +08:00
bukejiyu	113e330030	fix bf16 and add comments (#4106 )	2025-09-15 17:23:07 +08:00
bukejiyu	29ed617f0f	[v1 loader]qwen Offline fp8 (#4036 ) * support offline fp8 * update ut * update ut * update ut * fix * update * update	2025-09-15 13:44:11 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00
jiangjiajun	684703fd72	[LLM] First commit the llm deployment code	2025-06-09 19:20:15 +08:00

10 Commits