FastDeploy/fastdeploy/model_executor/layers/quantization at b1a5b756a3566ff65bc49c7ec195db0d36fdbd08 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 17:11:21 +08:00

Files

T

History

Sunny-bot1 b1a5b756a3 [Optimize] Support WINT8 and group scale for Machete (#3905 )

2025-09-15 12:01:34 +08:00

..

[Optimize] Support WINT8 and group scale for Machete (#3905 )

2025-09-15 12:01:34 +08:00

__init__.py

…

block_wise_fp8.py

…

kv_cache.py

[BugFix]Fix load kv cache quant scale (#4077 )

2025-09-12 17:44:03 +08:00

mix_quant.py

cache feature (#3857 )

2025-09-07 18:52:46 +08:00

quant_base.py

…

tensor_wise_fp8.py

…

w4a8.py

load hadamard_block_size from config (#3797 )

2025-09-05 17:07:58 +08:00

w4afp8.py

load hadamard_block_size from config (#3797 )

2025-09-05 17:07:58 +08:00

w8a8.py

fix w8a8.py (#3733 )

2025-09-03 10:57:26 +08:00

weight_only.py

[Optimize] Support WINT8 and group scale for Machete (#3905 )

2025-09-15 12:01:34 +08:00

wfp8afp8.py

[Feature] GLM-45-AIR Support Mix Quantization(Dense wfp8afp8 and wint8 triton_moe_backend) (#4051 )

2025-09-11 20:08:09 +08:00

wint2.py

…