update quantization doc (#2659)

2026-04-23 00:17:25 +08:00 · 2025-07-01 15:05:02 +08:00
parent 92428a5ae4
commit 1bb296c5ad
4 changed files with 6 additions and 6 deletions
@@ -22,7 +22,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
       --max-num-seqs 32
 ```

- By specifying `--model baidu/ERNIE-4.5-300B-A47B-Paddle`, the model can be automatically downloaded from AIStudio. FastDeploy depends on Paddle format models. For more information, please refer to [Supported Model List](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/paddle_internal/FastDeploy/blob/feature%2Finference-refactor-20250528/docs/supported_models.md).
+- By specifying `--model baidu/ERNIE-4.5-300B-A47B-Paddle`, the model can be automatically downloaded from AIStudio. FastDeploy depends on Paddle format models. For more information, please refer to [Supported Model List](../supported_models.md).
 - By setting `--quantization` to `wint8` or `wint4`, online INT8/INT4 quantization can be selected.
 - Deploying ERNIE-4.5-300B-A47B-Paddle WINT8 requires at least 80G * 8 cards, while WINT4 requires 80GB * 4 cards.
 - For more deployment tutorials, please refer to [get_started](../get_started/ernie-4.5.md).
@@ -48,7 +48,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
       --max-num-seqs 32
 ```

- By specifying `--model baidu/ERNIE-4.5-300B-A47B-Paddle`, the model can be automatically downloaded from AIStudio. FastDeploy depends on Paddle format models. For more information, please refer to [Supported Model List](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/paddle_internal/FastDeploy/blob/feature%2Finference-refactor-20250528/docs/supported_models.md).
+- By specifying `--model baidu/ERNIE-4.5-300B-A47B-Paddle`, the model can be automatically downloaded from AIStudio. FastDeploy depends on Paddle format models. For more information, please refer to [Supported Model List](../supported_models.md).
 - By setting `--quantization` to `block_wise_fp8`, online Block-wise FP8 quantization can be selected.
 - Deploying ERNIE-4.5-300B-A47B-Paddle Block-wise FP8 requires at least 80G * 8 cards.
 - For more deployment tutorials, please refer to [get_started](../get_started/ernie-4.5.md)