[Model Runner] Prepare token count and move FA3 initialization into the graph (#6170)

* prepare for token num and put FA3 init in graph
2026-04-23 00:17:25 +08:00 · 2026-01-26 12:16:57 +08:00
parent 0966df78dc
commit adc69c15d0
10 changed files with 64 additions and 42 deletions
@@ -187,6 +187,7 @@ def speculate_limit_thinking_content_length(


 def pre_process(
+    token_num_cpu: int,
    input_ids: paddle.Tensor,
    seq_lens_this_time: paddle.Tensor,
    speculative_decoding: bool,
@@ -209,7 +210,6 @@ def pre_process(
        cu_seqlens_q:
        cu_seqlens_k:
    """
-    token_num_cpu = seq_lens_this_time.numpy().sum().item()
    specific_platform = current_platform.is_cuda() or current_platform.is_maca() or current_platform.is_iluvatar()
    if specific_platform and not speculative_decoding:
        # Note(ZKK): This case's code is very simple!