[INTEL_HPU] supported ERNIE-4.5-21B-A3B-Thinking (#5891)

ERNIE-4.5-21B-A3B-Thinking needs to use DefaultModelLoaderV1 mode

reference command line:
ENABLE_V1_KVCACHE_SCHEDULER=1 FD_ENC_DEC_BLOCK_NUM=8 HPU_PERF_BREAKDOWN_SYNC_MODE=1 \
HPU_WARMUP_BUCKET=0 MAX_PREFILL_NUM=1 FD_ATTENTION_BACKEND=HPU_ATTN \
python -m fastdeploy.entrypoints.openai.api_server --model \
./models--baidu--ERNIE-4.5-21B-A3B-Thinking/snapshots/4341bb42644d5422859509fa25d41544c57181f8/ \
--port 8388 --engine-worker-queue-port 8302 --metrics-port 8301 \
--cache-queue-port 8303 --max-model-len 16384 --tensor-parallel-size 1 \
--load-choices "default_v1" --num-gpu-blocks-override 5000 --kv-cache-ratio 0.5 \
--max-num-seqs 128 --block-size 64 --no-enable-prefix-caching \
--graph-optimization-config '{"use_cudagraph":false}'

Signed-off-by: Luo, Focus <focus.luo@intel.com>
This commit is contained in:
FocusLuo
2026-01-07 21:31:53 +08:00
committed by GitHub
parent 0a92e96f20
commit 64f910553e
+2 -1
View File
@@ -403,8 +403,9 @@ def v1_loader_support(fd_config):
or current_platform.is_xpu()
or current_platform.is_iluvatar()
or current_platform.is_maca()
or current_platform.is_intel_hpu()
):
_err_msg("v1loader currently only support backends gpu, xpu, iluvatar and maca")
_err_msg("v1loader currently only support backends gpu, xpu, intel_hpu, iluvatar and maca")
return False
if is_pre_sliced_weight(fd_config.model_config.model):