[BugFix][Scheduler] Fix can_schedule_block_num_threshold calculation (#6541)

* fix mtp acceptance rate decline

* [BugFix][Scheduler] Fix can_schedule_block_num_threshold calculation

Fix the calculation of can_schedule_block_num_threshold in
ResourceManagerV1. The original formula using need_prefill_tokens
could lead to incorrect threshold values. Now directly use
num_chunk_new_block for accurate block scheduling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
kevin
2026-02-28 16:23:18 +08:00
committed by GitHub
parent a2072fe20c
commit 5d42f19e0a
@@ -357,8 +357,8 @@ class ResourceManagerV1(ResourceManager):
can_schedule_block_num_threshold = num_chunk_new_block
else:
can_schedule_block_num_threshold = (
request.need_prefill_tokens + self.config.cache_config.block_size - 1
) // self.config.cache_config.block_size + len(self.running) * self.current_reserve_output_block_num
num_chunk_new_block + len(self.running) * self.current_reserve_output_block_num
)
if self.config.speculative_config.method is not None:
can_schedule_block_num_threshold = min(
can_schedule_block_num_threshold + 1, self.config.cache_config.max_block_num_per_seq