[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457)

* gate bf16 * add gate-fp32 * fix * update baseline * update * update * fix
2026-04-23 00:17:25 +08:00 · 2026-02-27 13:08:46 +08:00
parent edd31e8849
commit 53aaac69da
19 changed files with 95 additions and 28 deletions
@@ -258,7 +258,10 @@ class LinearBase(nn.Layer):
        Raises:
            NotImplementedError: If the weight dtype is not float8 or act dtype is not equal to weight dtype.
        """
-        linear_out = self.quant_method.apply(self, x)
+        if self.weight_dtype == "float32":
+            linear_out = self.quant_method.apply(self, x.cast("float32"))
+        else:
+            linear_out = self.quant_method.apply(self, x)

        return linear_out