FastDeploy/fastdeploy/model_executor/layers/quantization at 20de04e249d94f846c1f81d220aa2eab5b27e4ce - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

lizexu123 acdf0cd1d9 fix hadamard_block_size (#5888 )

2026-01-06 14:12:14 +08:00

..

…

__init__.py

fix hadamard_block_size (#5888 )

2026-01-06 14:12:14 +08:00

block_wise_fp8.py

[GraphOptimization] Wrap deep gemm and triton as python op (#5673 )

2025-12-24 15:23:46 +08:00

kv_cache.py

[Intel HPU] enable tensor_wise_fp8 (#5324 )

2025-12-17 16:45:03 +08:00

mix_quant.py

support w4afp8 moe offline permute & load (#5613 )

2025-12-22 15:12:57 +08:00

quant_base.py

…

tensor_wise_fp8.py

[Intel HPU] enable tensor_wise_fp8 (#5324 )

2025-12-17 16:45:03 +08:00

w4a8.py

[XPU] refactor moe ffn (#5501 )

2025-12-18 14:14:05 +08:00

w4afp8.py

[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 )

2025-12-30 14:11:52 +08:00

w8a8.py

…

weight_only.py

[XPU] refactor moe ffn (#5501 )

2025-12-18 14:14:05 +08:00

wfp8afp8.py

…

wint2.py

…