Files
FastDeploy/examples/mooncake_store
jc e911ac2ce7 [BugFix] Refine the preparation of cpu and storage cache (#5777)
* Refine the preparation of cpu and storage cache

* fix error

* fix error

* up

* fix

* up docs

* fix unittest

* remove debug info
2026-01-05 10:13:30 +08:00
..

MooncakeStore for FastDeploy

This document describes how to use MooncakeStore as the backend of FastDeploy.

Preparation

Install FastDeploy

Refer to NVIDIA CUDA GPU Installation for Fastdeploy installation.

Install MooncakeStore

pip install mooncake-transfer-engine

Run Examples

The example script is provided in run.sh. You can run it directly:

bash run.sh

In the example script, we will start a Mooncake master server and two FastDeploy server.

Launch Mooncake master server:

mooncake_master \
    --port=15001 \
    --enable_http_metadata_server=true  \
    --http_metadata_server_host=0.0.0.0 \
    --http_metadata_server_port=15002 \
    --metrics_port=15003 \

More parameter can be found in the official guide.

Launch the Fastdeploy with Mooncake enabled.

export MOONCAKE_CONFIG_PATH="./mooncake_config.json"

python -m fastdeploy.entrypoints.openai.api_server \
       --model ${MODEL_NAME} \
       --port ${PORT} \
       --metrics-port $((PORT + 1)) \
       --engine-worker-queue-port $((PORT + 2)) \
       --cache-queue-port $((PORT + 3)) \
       --max-model-len 32768 \
       --max-num-seqs 32 \
       --kvcache-storage-backend mooncake

Troubleshooting

For more details, please refer to: https://github.com/kvcache-ai/Mooncake/blob/main/docs/source/troubleshooting/troubleshooting.md