【New Feature】W4afp8 supports per group quantization (#4987)

* w4afp8 支持per group * code style * fix transpose * revert fast hardmard --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com> Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2026-04-23 00:17:25 +08:00 · 2025-11-13 19:17:27 +08:00
parent a5e949d9d0
commit ae7bee8122
21 changed files with 3114 additions and 2248 deletions
@@ -275,6 +275,7 @@ class DeepEPEngine:
        topk_idx: paddle.Tensor,
        expertwise_scale,
        use_fp8: bool = False,
+        quant_group_size: int = 128,
    ):
        if self.deepep_engine is None:
            raise RuntimeError("DeepEP buffer not initialized!")
@@ -294,6 +295,7 @@ class DeepEPEngine:
            use_fp8=use_fp8,
            async_finish=False,
            return_recv_hook=True,
+            num_per_channel=quant_group_size,
        )

        return packed_recv_x, recv_expert_count, handle, dispatch_hook