mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-24 01:29:57 +08:00
61789febb9
* support to load static quant ue8m0 scale of deepgemm via v0_loader * [Fix] Fix ue8m0 scale pack dimension calculation and block size validation 1. Fix pack dimension calculation in fused_moe_triton_backend.py: - Changed from `ceil_div(...) // 4` to `(num_scales + 3) // 4` for correct ceiling division - This ensures sufficient pack allocation when num_scales is not a multiple of 4 2. Fix block size hardcoding in block_wise_fp8.py: - Use `self.quant_config.weight_block_size` instead of hardcoded `[128, 128]` - Add assertion to ensure weight_block_size is `[128, 128]` for ue8m0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>