[Docs]add deepseek model doc (#6513)

* add deepseek model doc
2026-04-22 16:07:51 +08:00 · 2026-02-26 14:08:19 +08:00
parent b56a4099c0
commit 47bfd45bb6
2 changed files with 82 additions and 0 deletions
@@ -0,0 +1,43 @@
+[简体中文](../zh/best_practices/DeepSeek-V3.md)
+
+# DeepSeek-V3/V3.1 Model
+
+## I. Environment Preparation
+
+### 1.1 Support Requirements
+The minimum number of GPUs required for deployment on the following hardware for each quantization precision of DeepSeek-V3/V3.1 is as follows:
+
+| | WINT4 |
+|-----|-----|-----|
+|H800 80GB| 8 |
+
+### 1.2 Installing FastDeploy
+
+Installation process reference document [FastDeploy GPU Installation](../get_started/installation/nvidia_gpu.md)
+
+## II. How to Use
+
+### 2.1 Basics: Starting the Service
+
+**Example 1:** Deploying a Wint4 model 16K context service on an H800 with eight GPUs
+
+```shell
+
+MODEL_PATH=/models/DeepSeek-V3.2-Exp-BF16
+export FD_DISABLE_CHUNKED_PREFILL=1
+export FD_ATTENTION_BACKEND="MLA_ATTN"
+export FLAGS_flash_attn_version=3
+
+python -m fastdeploy.entrypoints.openai.api_server \
+    --model "$MODEL_PATH" \
+    --port 8180 \
+    --metrics-port 8181 \
+    --engine-worker-queue-port 8182 \
+    --cache-queue-port 8183 \
+    --tensor-parallel-size 8 \
+    --max-model-len 16384 \
+    --max-num-seq 100 \
+    --no-enable-prefix-caching \
+    --quantization wint4
+
+```
@@ -0,0 +1,39 @@
+[English](../../best_practices/DeepSeek-V3-V3.1.md)
+
+# DeepSeek-V3/V3.1 模型
+
+## 一、环境准备
+### 1.1 支持情况
+DeepSeek-V3/V3.1 各量化精度，在下列硬件上部署所需要的最小卡数如下：
+
+|     | WINT4 |
+|-----|-----|
+|H800 80GB| 8 |
+
+### 1.2 安装fastdeploy
+
+安装流程参考文档 [FastDeploy GPU 安装](../get_started/installation/nvidia_gpu.md)
+
+## 二、如何使用
+### 2.1 基础：启动服务
+ **示例1：** H800上八卡部署wint4模型16K上下文的服务
+```shell
+MODEL_PATH=/models/DeepSeek-V3.2-Exp-BF16
+
+export FD_DISABLE_CHUNKED_PREFILL=1
+export FD_ATTENTION_BACKEND="MLA_ATTN"
+export FLAGS_flash_attn_version=3
+
+python -m fastdeploy.entrypoints.openai.api_server \
+  --model "$MODEL_PATH" \
+  --port 8180 \
+  --metrics-port 8181 \
+  --engine-worker-queue-port 8182 \
+  --cache-queue-port 8183 \
+  --tensor-parallel-size 8 \
+  --max-model-len  16384 \
+  --max-num-seq 100 \
+  --no-enable-prefix-caching \
+  --quantization wint4
+
+```