mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 08:21:53 +08:00
e9b25aa72f
* Storage backend gets env params * up * up * up
MooncakeStore for FastDeploy
This document describes how to use MooncakeStore as the backend of FastDeploy.
Preparation
Install FastDeploy
Refer to NVIDIA CUDA GPU Installation for Fastdeploy installation.
Install MooncakeStore
pip install mooncake-transfer-engine
Run Examples
The example script is provided in run.sh. You can run it directly:
bash run.sh
In the example script, we will start a Mooncake master server and two FastDeploy server.
Launch Mooncake master server:
mooncake_master \
--port=15001 \
--enable_http_metadata_server=true \
--http_metadata_server_host=0.0.0.0 \
--http_metadata_server_port=15002 \
--metrics_port=15003 \
More parameter can be found in the official guide.
Launch the Fastdeploy with Mooncake enabled.
export MOONCAKE_CONFIG_PATH="./mooncake_config.json"
python -m fastdeploy.entrypoints.openai.api_server \
--model ${MODEL_NAME} \
--port ${PORT} \
--metrics-port $((PORT + 1)) \
--engine-worker-queue-port $((PORT + 2)) \
--cache-queue-port $((PORT + 3)) \
--max-model-len 32768 \
--max-num-seqs 32 \
--kvcache-storage-backend mooncake
Troubleshooting
For more details, please refer to: https://github.com/kvcache-ai/Mooncake/blob/main/docs/source/troubleshooting/troubleshooting.md