mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
[BugFix] fix cache transfer manager updating/clearing (#5930)
* [fix] fix cache transfer manager updating/clearing * [fix] fix code style * [fix] fix config * [fix] fix engine client * [fix] let worker update kv cache status signal * [fix] update worker process * [fix] fix clear/update for case if comm group is shutdown * [fix] update dynamic weight manager * [fix] fix port * [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting
This commit is contained in:
@@ -38,6 +38,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
|
||||
--gpu-memory-utilization 0.9 \
|
||||
--model "$MODEL_PATH" \
|
||||
--no-shutdown-comm-group-if-worker-idle \
|
||||
--swap-space 10 \
|
||||
--load-strategy ipc_snapshot \
|
||||
--dynamic-load-weight &
|
||||
|
||||
|
||||
Reference in New Issue
Block a user