[INTEL_HPU] supported ERNIE-4.5-21B-A3B-Thinking (#5891)

ERNIE-4.5-21B-A3B-Thinking needs to use DefaultModelLoaderV1 mode reference command line: ENABLE_V1_KVCACHE_SCHEDULER=1 FD_ENC_DEC_BLOCK_NUM=8 HPU_PERF_BREAKDOWN_SYNC_MODE=1 \ HPU_WARMUP_BUCKET=0 MAX_PREFILL_NUM=1 FD_ATTENTION_BACKEND=HPU_ATTN \ python -m fastdeploy.entrypoints.openai.api_server --model \ ./models--baidu--ERNIE-4.5-21B-A3B-Thinking/snapshots/4341bb42644d5422859509fa25d41544c57181f8/ \ --port 8388 --engine-worker-queue-port 8302 --metrics-port 8301 \ --cache-queue-port 8303 --max-model-len 16384 --tensor-parallel-size 1 \ --load-choices "default_v1" --num-gpu-blocks-override 5000 --kv-cache-ratio 0.5 \ --max-num-seqs 128 --block-size 64 --no-enable-prefix-caching \ --graph-optimization-config '{"use_cudagraph":false}' Signed-off-by: Luo, Focus <focus.luo@intel.com>
2026-04-23 00:17:25 +08:00 · 2026-01-07 21:31:53 +08:00
parent 0a92e96f20
commit 64f910553e
1 changed files with 2 additions and 1 deletions
@@ -403,8 +403,9 @@ def v1_loader_support(fd_config):
        or current_platform.is_xpu()
        or current_platform.is_iluvatar()
        or current_platform.is_maca()
+        or current_platform.is_intel_hpu()
    ):
-        _err_msg("v1loader currently only support backends gpu, xpu, iluvatar and maca")
+        _err_msg("v1loader currently only support backends gpu, xpu, intel_hpu, iluvatar and maca")
        return False

    if is_pre_sliced_weight(fd_config.model_config.model):