mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
[Optim] Remove limitation of number of kvcache blocks (#5612)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optim] Remove limitation of number of kvcache blocks * Update fastdeploy/envs.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/worker/iluvatar_worker.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add docs * Update fastdeploy/worker/worker_process.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix ci case --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -530,11 +530,9 @@ class PaddleDisWorkerProc:
|
||||
# 2. Calculate the appropriate number of blocks
|
||||
model_block_memory_used = self.worker.cal_theortical_kvcache()
|
||||
num_blocks_local = int(available_kv_cache_memory // model_block_memory_used)
|
||||
# NOTE(liuzichang): Too many block will lead to illegal memory access
|
||||
# We will develop dynamic limits in future.
|
||||
if num_blocks_local > 40000:
|
||||
logger.info(f"------- Reset num_blocks_local {num_blocks_local} to 40000")
|
||||
num_blocks_local = min(40000, num_blocks_local)
|
||||
if envs.FD_MAX_KVCACHE_BLOCKS > 0 and num_blocks_local > envs.FD_MAX_KVCACHE_BLOCKS:
|
||||
logger.info(f"------- Reset num_blocks_local {num_blocks_local} to {envs.FD_MAX_KVCACHE_BLOCKS}")
|
||||
num_blocks_local = envs.FD_MAX_KVCACHE_BLOCKS
|
||||
logger.info(f"------- model_block_memory_used:{model_block_memory_used / 1024**3} GB --------")
|
||||
logger.info(f"------- num_blocks_local:{num_blocks_local} --------")
|
||||
|
||||
|
||||
Reference in New Issue
Block a user