FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-24 01:29:57 +08:00

Files

T

RichardWooSJTU 61789febb9 [Quantization] Support to load static quant ue8m0 scale of DeepGEMM via v0_loader (#6433 )

* support to load static quant ue8m0 scale of deepgemm via v0_loader

* [Fix] Fix ue8m0 scale pack dimension calculation and block size validation

1. Fix pack dimension calculation in fused_moe_triton_backend.py:
   - Changed from `ceil_div(...) // 4` to `(num_scales + 3) // 4` for correct ceiling division
   - This ensures sufficient pack allocation when num_scales is not a multiple of 4

2. Fix block size hardcoding in block_wise_fp8.py:
   - Use `self.quant_config.weight_block_size` instead of hardcoded `[128, 128]`
   - Add assertion to ensure weight_block_size is `[128, 128]` for ue8m0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-03 11:32:35 +08:00

ops

[Feature] Support Ernie FP8 on sm100 ( the fixed version) (#6304 )

2026-02-03 17:47:38 +08:00

__init__.py

[Feature] Support NVFP4 MoE on SM100 (#6003 )

2026-01-29 14:16:07 +08:00

block_wise_fp8.py

[Quantization] Support to load static quant ue8m0 scale of DeepGEMM via v0_loader (#6433 )

2026-03-03 11:32:35 +08:00

fp8_utils.py

fix reshard error (#6536 )