[Feature] consider multimodal model when dummy run (#6045)

* add mm do profile * updata code * update code * update code * update code * update test case * update code * update code * fix xpu bug * update code * add mm do profile * update test case * update code
2026-04-23 17:11:21 +08:00 · 2026-02-09 17:49:55 +08:00
parent 783d56e28a
commit d60daca4a8
25 changed files with 166 additions and 19 deletions
@@ -34,6 +34,7 @@ def make_prefix_cache_manager(max_num_seqs, enable_mm=False, num_gpu_blocks_over
    speculative_cfg = SimpleNamespace(method=None)
    model_cfg.print = print
    model_cfg.architectures = ["test_model"]
+    model_cfg.mm_max_tokens_per_item = None
    cache_cfg.bytes_per_layer_per_block = 1
    parallel_cfg = ParallelConfig(args)
    scheduler_cfg = SchedulerConfig(args)