mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

AIbin 47bfd45bb6 [Docs]add deepseek model doc (#6513 )

* add deepseek model doc

2026-02-26 14:08:19 +08:00

1.1 KiB

Raw Blame History

简体中文

DeepSeek-V3/V3.1 Model

I. Environment Preparation

1.1 Support Requirements

The minimum number of GPUs required for deployment on the following hardware for each quantization precision of DeepSeek-V3/V3.1 is as follows:

	WINT4
H800 80GB	8

1.2 Installing FastDeploy

Installation process reference document FastDeploy GPU Installation

II. How to Use

2.1 Basics: Starting the Service

Example 1: Deploying a Wint4 model 16K context service on an H800 with eight GPUs


MODEL_PATH=/models/DeepSeek-V3.2-Exp-BF16
export FD_DISABLE_CHUNKED_PREFILL=1
export FD_ATTENTION_BACKEND="MLA_ATTN"
export FLAGS_flash_attn_version=3

python -m fastdeploy.entrypoints.openai.api_server \
    --model "$MODEL_PATH" \
    --port 8180 \
    --metrics-port 8181 \
    --engine-worker-queue-port 8182 \
    --cache-queue-port 8183 \
    --tensor-parallel-size 8 \
    --max-model-len 16384 \
    --max-num-seq 100 \
    --no-enable-prefix-caching \
    --quantization wint4

1.1 KiB Raw Blame History