[Intel HPU] enable tensor_wise_fp8 (#5324)

* [Intel HPU] enable tensor_wise_fp8 * update code based on comments * fix code style issue * fix bug about RP 5138 * mv kv_cache modifications to HPU backend * fix FP8 Precision Issues * fix FP8 Precision Issues * Add quantization UT --------- Co-authored-by: yanfeich <yanfei.cheng@intel.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-04-23 00:17:25 +08:00 · 2025-12-17 16:45:03 +08:00
parent 15f5112ecb
commit 404cf0ece4
17 changed files with 824 additions and 116 deletions
@@ -64,7 +64,6 @@ def get_moe_method():
        from fastdeploy.model_executor.layers.backends import HpuMoEMethod

        return HpuMoEMethod(None)
-        # return HpuTensorWiseFP8MoEMethod(None)

    elif current_platform.is_maca():
        from fastdeploy.model_executor.layers.backends import (