mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

jc e911ac2ce7 [BugFix] Refine the preparation of cpu and storage cache (#5777 )

* Refine the preparation of cpu and storage cache

* fix error

* fix error

* up

* fix

* up docs

* fix unittest

* remove debug info

2026-01-05 10:13:30 +08:00

mooncake_config.json

[Feature] Support KV Cache Storage (#5571 )

2025-12-25 16:30:35 +08:00

README.md

[BugFix] Refine the preparation of cpu and storage cache (#5777 )

2026-01-05 10:13:30 +08:00

run.sh

[Feature] Support KV Cache Storage (#5571 )

2025-12-25 16:30:35 +08:00

utils.sh

[Feature] Support KV Cache Storage (#5571 )

2025-12-25 16:30:35 +08:00

README.md

MooncakeStore for FastDeploy

This document describes how to use MooncakeStore as the backend of FastDeploy.

Preparation

Install FastDeploy

Refer to NVIDIA CUDA GPU Installation for Fastdeploy installation.

Install MooncakeStore

pip install mooncake-transfer-engine

Run Examples

The example script is provided in run.sh. You can run it directly:

bash run.sh

In the example script, we will start a Mooncake master server and two FastDeploy server.

Launch Mooncake master server:

mooncake_master \
    --port=15001 \
    --enable_http_metadata_server=true  \
    --http_metadata_server_host=0.0.0.0 \
    --http_metadata_server_port=15002 \
    --metrics_port=15003 \

More parameter can be found in the official guide.

Launch the Fastdeploy with Mooncake enabled.

export MOONCAKE_CONFIG_PATH="./mooncake_config.json"

python -m fastdeploy.entrypoints.openai.api_server \
       --model ${MODEL_NAME} \
       --port ${PORT} \
       --metrics-port $((PORT + 1)) \
       --engine-worker-queue-port $((PORT + 2)) \
       --cache-queue-port $((PORT + 3)) \
       --max-model-len 32768 \
       --max-num-seqs 32 \
       --kvcache-storage-backend mooncake

Troubleshooting

For more details, please refer to: https://github.com/kvcache-ai/Mooncake/blob/main/docs/source/troubleshooting/troubleshooting.md