[Speculative Decoding] Support mtp expert-parallel and support different modality deploy (#7018)

* support mtp ep and support different modality * fix default arg
2026-04-23 00:17:25 +08:00 · 2026-03-26 13:52:16 +08:00
parent 61ebac49ef
commit 4fd877ed43
10 changed files with 112 additions and 19 deletions
@@ -622,6 +622,7 @@ class LLMEngine:
            f" --routing_replay_config '{self.cfg.routing_replay_config.to_json_string()}'"
            f" --model-impl {self.cfg.model_config.model_impl}"
            f" --num_cpu_blocks {self.cfg.cache_config.num_cpu_blocks}"
+            f" --deploy_modality {self.cfg.deploy_modality.value}"
        )
        if self.cfg.structured_outputs_config.logits_processors is not None:
            arguments += f" --logits-processors {' '.join(self.cfg.structured_outputs_config.logits_processors)}"