FastDeploy/examples/cache_storage/README.md

# MooncakeStore for FastDeploy

This document describes how to use MooncakeStore as the backend of FastDeploy.

## Preparation

### Install FastDeploy

Refer to [NVIDIA CUDA GPU Installation](https://paddlepaddle.github.io/FastDeploy/get_started/installation/nvidia_gpu/) for Fastdeploy installation.

### Install MooncakeStore

```bash
pip install mooncake-transfer-engine
```

## Run Examples

The example script is provided in `run.sh`. You can run it directly:
```
bash run.sh
```

In the example script, we will start a Mooncake master server and two FastDeploy server.

Launch Mooncake master server:
```bash
mooncake_master \
    --port=15001 \
    --enable_http_metadata_server=true  \
    --http_metadata_server_host=0.0.0.0 \
    --http_metadata_server_port=15002 \
    --metrics_port=15003 \
```

More parameter can be found in the [official guide](https://github.com/kvcache-ai/Mooncake/blob/main/docs/source/python-api-reference/transfer-engine.md).

Launch the Fastdeploy with Mooncake enabled.

```bash
export MOONCAKE_CONFIG_PATH="./mooncake_config.json"

python -m fastdeploy.entrypoints.openai.api_server \
       --model ${MODEL_NAME} \
       --port ${PORT} \
       --metrics-port $((PORT + 1)) \
       --engine-worker-queue-port $((PORT + 2)) \
       --cache-queue-port $((PORT + 3)) \
       --max-model-len 32768 \
       --max-num-seqs 32 \
       --kvcache-storage-backend mooncake
```

## Troubleshooting

For more details, please refer to:
https://github.com/kvcache-ai/Mooncake/blob/main/docs/source/troubleshooting/troubleshooting.md