FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 08:21:53 +08:00

Author	SHA1	Message	Date
fxyfxy777	4c92035f2d	[Feature] Unify fp8 block_wise quant ops (#5991 ) * quant stash * blockwise_quant * precommit * rm tensor.cut * tp ok * add swiglu * rm outdate code * fix activate ut * change baseline * fix baseline error	2026-01-15 05:50:37 -08:00
Ryan	724045c426	add some op infershape&dtype (#5762 )	2025-12-26 16:17:39 +08:00
Yuanle Liu	cdc0004894	Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563 )" (#5611 ) This reverts commit `73e1d6aa90`.	2025-12-17 13:59:06 +08:00
fxyfxy777	73e1d6aa90	[Feature] add ue8m0 for per_token_quant_fp8 (#5563 ) * ue8m0 * add default arg --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-16 18:40:12 +08:00
周周周	95243f012c	[Others] add PADDLE_ENFORCE (#5288 )	2025-11-28 14:23:35 +08:00
Ryan	e25c067f70	[OP] Add InferShape&InferDtype for `per_token_quant_padding` (#4667 ) * add InferShape&InferDtype for per_token_quant_padding * fix codestyle	2025-10-30 10:28:26 +08:00
周周周	76513f6416	Support 45t fp8 8 GPU (#3659 )	2025-08-28 10:52:53 +08:00
RichardWooSJTU	e39159f3bd	Add switch to apply fine-grained per token quant fp8 (#3192 ) Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-08-04 19:54:03 -07:00
Jiang-Jia-Jun	05c670e593	[Sync] Update to latest code (#2679 ) * [Sync] Update to latest code * Add new code files * Add new code files * update code * Try to fix build.sh * Try to fix build.sh * Update code * Update requirements.txt * Update code --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-03 15:43:53 +08:00
MARD1NO	ac5f860536	use shfl_xor_sync to reduce redundant shfl broadcast	2025-06-30 13:12:21 +08:00
jiangjiajun	684703fd72	[LLM] First commit the llm deployment code	2025-06-09 19:20:15 +08:00

11 Commits