[XPU][Docs] Update Release Note (#7262)

* update * update docs * update docs * update commit * update commit --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-04-23 00:17:25 +08:00 · 2026-04-10 15:22:16 +08:00
parent 14d46181b8
commit 4aecaa70ba
2 changed files with 110 additions and 154 deletions
@@ -2,32 +2,11 @@

 ## 支持的模型

-| 模型名称 | 上下文长度 | 量化 | 所需卡数 | 适用版本 |
-|----------|-----------|------|---------|---------|
-| ERNIE-4.5-300B-A47B | 32K | WINT8 | 8 | 2.5.0 |
-| ERNIE-4.5-300B-A47B | 32K | WINT4 | 4 | 2.5.0 |
-| ERNIE-4.5-300B-A47B | 32K | WINT4 | 8 | 2.5.0 |
-| ERNIE-4.5-300B-A47B | 128K | WINT4 | 8 | 2.5.0 |
-| ERNIE-4.5-21B-A3B | 32K | BF16 | 1 | 2.5.0 |
-| ERNIE-4.5-21B-A3B | 32K | WINT8 | 1 | 2.5.0 |
-| ERNIE-4.5-21B-A3B | 32K | WINT4 | 1 | 2.5.0 |
-| ERNIE-4.5-21B-A3B | 128K | BF16 | 1 | 2.5.0 |
-| ERNIE-4.5-21B-A3B | 128K | WINT8 | 1 | 2.5.0 |
-| ERNIE-4.5-21B-A3B | 128K | WINT4 | 1 | 2.5.0 |
-| ERNIE-4.5-0.3B | 32K | BF16 | 1 | 2.5.0 |
-| ERNIE-4.5-0.3B | 32K | WINT8 | 1 | 2.5.0 |
-| ERNIE-4.5-0.3B | 128K | BF16 | 1 | 2.5.0 |
-| ERNIE-4.5-0.3B | 128K | WINT8 | 1 | 2.5.0 |
-| ERNIE-4.5-300B-A47B-W4A8C8-TP4 | 32K | W4A8 | 4 | 2.5.0 |
-| ERNIE-4.5-VL-28B-A3B | 32K | WINT8 | 1 | 2.5.0 |
-| ERNIE-4.5-VL-424B-A47B | 32K | WINT8 | 8 | 2.5.0 |
-| PaddleOCR-VL-0.9B | 32K | BF16 | 1 | 2.5.0 |
-| ERNIE-4.5-VL-28B-A3B-Thinking | 128K | WINT8 | 1 | 2.5.0 |
-
+注：以下模型支持和部署命令仅适用于 2.5.0 版本
 <details>
-<summary><b>ERNIE-4.5-300B-A47B (32K, WINT8, 8 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-300B-A47B (32K, WINT8, 8 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -40,11 +19,11 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.9
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
 export BKCL_ENABLE_XDR=1
-export BKCL_RDMA_NICS=eth1,eth1,eth3,eth4  # 与线上网卡名一致
+export BKCL_RDMA_NICS=mlx5_1,mlx5_1,mlx5_2,mlx5_2,mlx5_3,mlx5_3,mlx5_4,mlx5_4  # 通过 `xpu-smi topo -m` 命令查看机器的RDMA网卡名称
 export BKCL_TRACE_TOPO=1
 export BKCL_PCIE_RING=1
 export XSHMEM_MODE=1
@@ -64,14 +43,14 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --enable-expert-parallel \
    --enable-prefix-caching \
    --data-parallel-size 1 \
-    --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "${mtp_model_path}"}'
+    --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "'${mtp_model_path}'"}'
 ```
 </details>

 <details>
-<summary><b>ERNIE-4.5-300B-A47B (32K, WINT4, 4 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-300B-A47B (32K, WINT4, 4 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3"  # 或 "4,5,6,7"
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -84,11 +63,11 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.9
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3"  # 或 "4,5,6,7"
 export BKCL_ENABLE_XDR=1
-export BKCL_RDMA_NICS=eth1,eth1,eth3,eth4  # 与线上网卡名一致
+export BKCL_RDMA_NICS=mlx5_1,mlx5_1,mlx5_2,mlx5_2  # 通过 `xpu-smi topo -m` 命令查看机器的RDMA网卡名称
 export BKCL_TRACE_TOPO=1
 export BKCL_PCIE_RING=1
 export XSHMEM_MODE=1
@@ -108,14 +87,14 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --enable-expert-parallel \
    --enable-prefix-caching \
    --data-parallel-size 1 \
-    --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "${mtp_model_path}"}'
+    --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "'${mtp_model_path}'"}'
 ```
 </details>

 <details>
-<summary><b>ERNIE-4.5-300B-A47B (32K, WINT4, 8 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-300B-A47B (32K, WINT4, 8 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -128,18 +107,18 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.95
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
 export BKCL_ENABLE_XDR=1
-export BKCL_RDMA_NICS=eth1,eth1,eth3,eth4  # 与线上网卡名一致
+export BKCL_RDMA_NICS=mlx5_1,mlx5_1,mlx5_2,mlx5_2,mlx5_3,mlx5_3,mlx5_4,mlx5_4  # 通过 `xpu-smi topo -m` 命令查看机器的RDMA网卡名称
 export BKCL_TRACE_TOPO=1
 export BKCL_PCIE_RING=1
 export XSHMEM_MODE=1
 export XSHMEM_QP_NUM_PER_RANK=32
 export BKCL_RDMA_VERBS=1
 python -m fastdeploy.entrypoints.openai.api_server \
-    --model /home/ERNIE-4.5-300B-A47B-Paddle \
+    --model /PaddlePaddle/ERNIE-4.5-300B-A47B-Paddle \
    --port 8188 \
    --engine-worker-queue-port 8124 \
    --metrics-port 8125 \
@@ -152,14 +131,14 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --enable-expert-parallel \
    --enable-prefix-caching \
    --data-parallel-size 1 \
-    --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "${mtp_model_path}"}'
+    --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "'${mtp_model_path}'"}'
 ```
 </details>

 <details>
-<summary><b>ERNIE-4.5-300B-A47B (128K, WINT4, 8 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-300B-A47B (128K, WINT4, 8 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -172,18 +151,18 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.9
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
 export BKCL_ENABLE_XDR=1
-export BKCL_RDMA_NICS=eth1,eth1,eth3,eth4  # 与线上网卡名一致
+export BKCL_RDMA_NICS=mlx5_1,mlx5_1,mlx5_2,mlx5_2,mlx5_3,mlx5_3,mlx5_4,mlx5_4 # 通过 `xpu-smi topo -m` 命令查看机器的RDMA网卡名称
 export BKCL_TRACE_TOPO=1
 export BKCL_PCIE_RING=1
 export XSHMEM_MODE=1
 export XSHMEM_QP_NUM_PER_RANK=32
 export BKCL_RDMA_VERBS=1
 python -m fastdeploy.entrypoints.openai.api_server \
-    --model /home/ERNIE-4.5-300B-A47B-Paddle \
+    --model /PaddlePaddle/ERNIE-4.5-300B-A47B-Paddle \
    --port 8123 \
    --engine-worker-queue-port 8124 \
    --metrics-port 8125 \
@@ -196,14 +175,14 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --enable-expert-parallel \
    --enable-prefix-caching \
    --data-parallel-size 1 \
-    --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "${mtp_model_path}"}'
+    --speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "'${mtp_model_path}'"}'
 ```
 </details>

 <details>
-<summary><b>ERNIE-4.5-21B-A3B (32K, BF16, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-21B-A3B (32K, BF16, 1 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -215,7 +194,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.9
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -230,9 +209,9 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-21B-A3B (32K, WINT8, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-21B-A3B (32K, WINT8, 1 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -245,7 +224,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.9
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -261,9 +240,9 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-21B-A3B (32K, WINT4, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-21B-A3B (32K, WINT4, 1 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -276,7 +255,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.9
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -292,9 +271,9 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-21B-A3B (128K, BF16, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-21B-A3B (128K, BF16, 1 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -306,7 +285,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.9
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -321,9 +300,9 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-21B-A3B (128K, WINT8, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-21B-A3B (128K, WINT8, 1 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -336,7 +315,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.9
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -352,9 +331,9 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-21B-A3B (128K, WINT4, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-21B-A3B (128K, WINT4, 1 卡)</b> </summary>

-**快速部署：**
+**快速启动：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -367,7 +346,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
    --gpu-memory-utilization 0.9
 ```

-**最优部署：**
+**性能更优：**
 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
 python -m fastdeploy.entrypoints.openai.api_server \
@@ -383,7 +362,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-0.3B (32K, BF16, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-0.3B (32K, BF16, 1 卡)</b> </summary>

 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
@@ -398,7 +377,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-0.3B (32K, WINT8, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-0.3B (32K, WINT8, 1 卡)</b> </summary>

 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
@@ -414,7 +393,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-0.3B (128K, BF16, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-0.3B (128K, BF16, 1 卡)</b> </summary>

 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
@@ -429,7 +408,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-0.3B (128K, WINT8, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-0.3B (128K, WINT8, 1 卡)</b> </summary>

 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
@@ -445,7 +424,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-300B-A47B-W4A8C8-TP4 (32K, W4A8, 4 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-300B-A47B-W4A8C8-TP4 (32K, W4A8, 4 卡)</b> </summary>

 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3"  # 或 "4,5,6,7"
@@ -461,7 +440,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-VL-28B-A3B (32K, WINT8, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-VL-28B-A3B (32K, WINT8, 1 卡)</b> </summary>

 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
@@ -480,7 +459,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-VL-424B-A47B (32K, WINT8, 8 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-VL-424B-A47B (32K, WINT8, 8 卡)</b> </summary>

 ```bash
 export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
@@ -500,7 +479,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>PaddleOCR-VL-0.9B (32K, BF16, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>PaddleOCR-VL-0.9B (32K, BF16, 1 卡)</b> </summary>

 ```bash
 export FD_ENABLE_MAX_PREFILL=1
@@ -518,7 +497,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
 </details>

 <details>
-<summary><b>ERNIE-4.5-VL-28B-A3B-Thinking (128K, WINT8, 1 卡)</b> - 点击查看部署命令</summary>
+<summary><b>ERNIE-4.5-VL-28B-A3B-Thinking (128K, WINT8, 1 卡)</b> </summary>

 ```bash
 export XPU_VISIBLE_DEVICES="0"  # 指定任意一张卡
@@ -538,10 +517,9 @@ python -m fastdeploy.entrypoints.openai.api_server \
 ```
 </details>

-## 快速开始
-
-### 基于ERNIE-4.5-300B-A47B-Paddle模型部署在线服务
+## 示例

+### 运行ERNIE-4.5-300B-A47B-Paddle
 #### 启动服务

 基于 WINT4 精度和 32K 上下文部署 ERNIE-4.5-300B-A47B-Paddle 模型到 4 卡 P800 服务器
@@ -611,7 +589,7 @@ print('\n')

 OpenAI 协议的更多说明可参考文档 [OpenAI Chat Completion API](https://platform.openai.com/docs/api-reference/chat/create)，以及与 OpenAI 协议的区别可以参考 [兼容 OpenAI 协议的服务化部署](../online_serving/README.md)。

-### 基于ERNIE-4.5-VL-28B-A3B-Paddle模型部署在线服务
+### 运行ERNIE-4.5-VL-28B-A3B-Paddle

 #### 启动服务

@@ -683,7 +661,7 @@ for chunk in response:
 print('\n')
 ```

-### 基于PaddleOCR-VL-0.9B模型部署在线服务
+### 运行PaddleOCR-VL-0.9B

 #### 启动服务

@@ -754,7 +732,7 @@ for chunk in response:
 print('\n')
 ```

-### 基于ERNIE-4.5-VL-28B-A3B-Thinking模型部署在线服务
+### 运行ERNIE-4.5-VL-28B-A3B-Thinking

 #### 启动服务