mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
e9b25aa72f
* Storage backend gets env params * up * up * up
58 lines
1.5 KiB
Markdown
58 lines
1.5 KiB
Markdown
# MooncakeStore for FastDeploy
|
|
|
|
This document describes how to use MooncakeStore as the backend of FastDeploy.
|
|
|
|
## Preparation
|
|
|
|
### Install FastDeploy
|
|
|
|
Refer to [NVIDIA CUDA GPU Installation](https://paddlepaddle.github.io/FastDeploy/get_started/installation/nvidia_gpu/) for Fastdeploy installation.
|
|
|
|
### Install MooncakeStore
|
|
|
|
```bash
|
|
pip install mooncake-transfer-engine
|
|
```
|
|
|
|
## Run Examples
|
|
|
|
The example script is provided in `run.sh`. You can run it directly:
|
|
```
|
|
bash run.sh
|
|
```
|
|
|
|
In the example script, we will start a Mooncake master server and two FastDeploy server.
|
|
|
|
Launch Mooncake master server:
|
|
```bash
|
|
mooncake_master \
|
|
--port=15001 \
|
|
--enable_http_metadata_server=true \
|
|
--http_metadata_server_host=0.0.0.0 \
|
|
--http_metadata_server_port=15002 \
|
|
--metrics_port=15003 \
|
|
```
|
|
|
|
More parameter can be found in the [official guide](https://github.com/kvcache-ai/Mooncake/blob/main/docs/source/python-api-reference/transfer-engine.md).
|
|
|
|
Launch the Fastdeploy with Mooncake enabled.
|
|
|
|
```bash
|
|
export MOONCAKE_CONFIG_PATH="./mooncake_config.json"
|
|
|
|
python -m fastdeploy.entrypoints.openai.api_server \
|
|
--model ${MODEL_NAME} \
|
|
--port ${PORT} \
|
|
--metrics-port $((PORT + 1)) \
|
|
--engine-worker-queue-port $((PORT + 2)) \
|
|
--cache-queue-port $((PORT + 3)) \
|
|
--max-model-len 32768 \
|
|
--max-num-seqs 32 \
|
|
--kvcache-storage-backend mooncake
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
For more details, please refer to:
|
|
https://github.com/kvcache-ai/Mooncake/blob/main/docs/source/troubleshooting/troubleshooting.md
|