mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
[BugFix][Scheduler] Fix can_schedule_block_num_threshold calculation (#6541)
* fix mtp acceptance rate decline * [BugFix][Scheduler] Fix can_schedule_block_num_threshold calculation Fix the calculation of can_schedule_block_num_threshold in ResourceManagerV1. The original formula using need_prefill_tokens could lead to incorrect threshold values. Now directly use num_chunk_new_block for accurate block scheduling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -357,8 +357,8 @@ class ResourceManagerV1(ResourceManager):
|
||||
can_schedule_block_num_threshold = num_chunk_new_block
|
||||
else:
|
||||
can_schedule_block_num_threshold = (
|
||||
request.need_prefill_tokens + self.config.cache_config.block_size - 1
|
||||
) // self.config.cache_config.block_size + len(self.running) * self.current_reserve_output_block_num
|
||||
num_chunk_new_block + len(self.running) * self.current_reserve_output_block_num
|
||||
)
|
||||
if self.config.speculative_config.method is not None:
|
||||
can_schedule_block_num_threshold = min(
|
||||
can_schedule_block_num_threshold + 1, self.config.cache_config.max_block_num_per_seq
|
||||
|
||||
Reference in New Issue
Block a user