perf: avoid unnecessary dtype casting in RMSNorm

Added checks before calling `.astype` in `fastdeploy/model_executor/layers/normalization.py`. In PaddlePaddle, calling `.astype` allocates a new tensor even if it's already the target dtype, avoiding these casts skips memory allocations and kernel launches on the hot path.
2026-04-22 16:07:51 +08:00 · 2026-04-19 15:16:05 +00:00
parent b2aca6c550
commit 18ae6aa4d6
2 changed files with 13 additions and 4 deletions
@@ -0,0 +1,3 @@
+## 2026-04-19 - Unnecessary dtype conversions in hot paths
+**Learning:** In PaddlePaddle, calling `.astype(dtype)` creates a new tensor and dispatches a kernel even when the tensor is already of the target dtype, which can slow down hot paths like RMSNorm.
+**Action:** Add explicit conditional checks (`if tensor.dtype != target_dtype`) before calling `.astype` in frequently executed methods to save memory allocations and kernel dispatch overheads.
@@ -232,10 +232,12 @@ class RMSNorm(nn.Layer):
                  operations (like linear transformation) on the `residual_input`.
        """
        x_dtype = x.dtype
-        x = x.astype(self.weight.dtype)
+        if x.dtype != self.weight.dtype:
+            x = x.astype(self.weight.dtype)
        if residual_input is not None:
            residual_input_dtype = residual_input.dtype
-            residual_input = residual_input.astype(self.weight.dtype)
+            if residual_input.dtype != self.weight.dtype:
+                residual_input = residual_input.astype(self.weight.dtype)

        if residual_input is None:
            residual_out = x
@@ -276,9 +278,13 @@ class RMSNorm(nn.Layer):
                x = x + residual_input
            norm_out = proxy_rmsnorm(x, self.weight, self.eps), x

-        out = norm_out[0].astype(x_dtype)
+        out = norm_out[0]
+        if out.dtype != x_dtype:
+            out = out.astype(x_dtype)
        if residual_input is not None:
-            residual_out = norm_out[1].astype(residual_input_dtype)
+            residual_out = norm_out[1]
+            if residual_out.dtype != residual_input_dtype:
+                residual_out = residual_out.astype(residual_input_dtype)

        if self.split_x:
            assert residual_out is not None