[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457)

* gate bf16

* add gate-fp32

* fix

* update baseline

* update

* update

* fix
This commit is contained in:
sunxin
2026-02-27 13:08:46 +08:00
committed by GitHub
parent edd31e8849
commit 53aaac69da
19 changed files with 95 additions and 28 deletions
+4 -1
View File
@@ -258,7 +258,10 @@ class LinearBase(nn.Layer):
Raises:
NotImplementedError: If the weight dtype is not float8 or act dtype is not equal to weight dtype.
"""
linear_out = self.quant_method.apply(self, x)
if self.weight_dtype == "float32":
linear_out = self.quant_method.apply(self, x.cast("float32"))
else:
linear_out = self.quant_method.apply(self, x)
return linear_out