Files
FastDeploy/examples/mooncake_store
Juncai 412867fd99 [Feature] Support KV Cache Storage (#5571)
* Support Mooncake Store

* up

* up

* add op

* fix conflict

* fix error

* up for comments

* avoid thread lock

* up

* fix unittest

* fix unittest

* remove debug info

* consider tp_size > 1

* add default rdma_nics

* add utils

* up

* fix error

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 16:30:35 +08:00
..

MooncakeStore for FastDeploy

This document describes how to use MooncakeStore as the backend of FastDeploy.

Preparation

Install FastDeploy

Refer to NVIDIA CUDA GPU Installation for Fastdeploy installation.

Install MooncakeStore

pip install mooncake-transfer-engine

Run Examples

The example script is provided in run.sh. You can run it directly:

bash run.sh

In the example script, we will start a Mooncake master server and two FastDeploy server.

Launch Mooncake master server:

mooncake_master \
    --port=15001 \
    --enable_http_metadata_server=true  \
    --http_metadata_server_host=0.0.0.0 \
    --http_metadata_server_port=15002 \
    --metrics_port=15003 \

More parameter can be found in the official guide.

Launch the Fastdeploy with Mooncake enabled.

export MOONCAKE_CONFIG_PATH="./mooncake_config.json"

python -m fastdeploy.entrypoints.openai.api_server \
       --model ${MODEL_NAME} \
       --port ${PORT} \
       --metrics-port $((PORT + 1)) \
       --engine-worker-queue-port $((PORT + 2)) \
       --cache-queue-port $((PORT + 3)) \
       --max-model-len 32768 \
       --max-num-seqs 32 \
       --kvcache-storage-backend mooncake

Troubleshooting

For more details, please refer to: https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/troubleshooting.md