[简体中文](../zh/features/data_parallel_service.md) # Data Parallelism (DP) Data Parallelism (DP) is a distributed inference approach in which incoming requests are distributed across multiple **identical model replicas**, with each replica independently handling inference for its assigned requests. In practice, especially when deploying **Mixture-of-Experts (MoE)** models, **Data Parallelism (DP)** is often combined with **Expert Parallelism (EP)**. Each DP service independently performs the Attention computation, while all DP services collaboratively participate in the MoE computation, thereby improving overall inference performance. FastDeploy supports DP-based inference and provides the `multi_api_server` interface to launch multiple inference services simultaneously. ![Architecture](./images/no_scheduler_img.png) --- ## Launching FastDeploy Services Taking the **ERNIE-4.5-300B** model as an example, the following command launches a service with **DP=8, TP=1, EP=8**: ```shell export FD_ENABLE_MULTI_API_SERVER=1 python -m fastdeploy.entrypoints.openai.multi_api_server \ --num-servers 8 \ --ports "1811,1822,1833,1844,1855,1866,1877,1888" \ --metrics-ports "3101,3201,3301,3401,3501,3601,3701,3801" \ --args --model ERNIE-4_5-300B-A47B-FP8-Paddle \ --engine-worker-queue-port "25611,25621,25631,25641,25651,25661,25671,25681" \ --tensor-parallel-size 1 \ --data-parallel-size 8 \ --max-model-len 12288 \ --max-num-seqs 64 \ --num-gpu-blocks-override 256 \ --enable-expert-parallel ``` ### Parameter Description * `num-servers`: Number of DP service instances to launch. * `ports`: API server ports for the DP service instances. The number of ports must match `num-servers`. * `metrics-ports`: Metrics server ports for the DP service instances. The number must match `num-servers`. If not specified, available ports will be allocated automatically. * `args`: Arguments passed to each DP service instance. Refer to the [parameter documentation](../parameters.md) for details. --- ## Request Scheduling After launching multiple DP services using the data parallel strategy, incoming user requests must be distributed across the services by a scheduler to achieve load balancing. ### Web Server–Based Scheduling Once the IP addresses and ports of the DP service instances are known, common web servers (such as **Nginx**) can be used to implement request scheduling. Details are omitted here. ### FastDeploy Router FastDeploy provides a Python-based [Router](https://github.com/PaddlePaddle/FastDeploy/tree/develop/fastdeploy/router) to handle request reception and scheduling. A high-performance Router implementation is currently under development. The usage and request scheduling workflow is as follows: * Start the Router * Start FastDeploy service instances (either single-DP or multi-DP), which register themselves with the Router * User requests are sent to the Router * The Router selects an appropriate service instance based on the global load status * The Router forwards the request to the selected instance for inference * The Router receives the generated result from the instance and returns it to the user --- ## Quick Start Example ### Launching the Router Start the Router service. Logs are written to `log_router/router.log`. `fd-router` installation instructions can be found in the [Router documentation](../online_serving/router.md). ```shell export FD_LOG_DIR="log_router" /usr/local/bin/fd-router \ --port 30000 ``` ### Launching DP Services with Router Again using the **ERNIE-4.5-300B** model as an example, the following command launches **DP=8, TP=1, EP=8** services and registers them with the Router via the `--router` argument: ```shell export FD_ENABLE_MULTI_API_SERVER=1 python -m fastdeploy.entrypoints.openai.multi_api_server \ --num-servers 8 \ --ports "1811,1822,1833,1844,1855,1866,1877,1888" \ --metrics-ports "3101,3201,3301,3401,3501,3601,3701,3801" \ --args --model ERNIE-4_5-300B-A47B-FP8-Paddle \ --tensor-parallel-size 1 \ --data-parallel-size 8 \ --max-model-len 12288 \ --max-num-seqs 64 \ --num-gpu-blocks-override 256 \ --enable-expert-parallel \ --router "0.0.0.0:30000" ```