[Speculative Decoding] Support mtp expert-parallel and support different modality deploy (#7018)

* support mtp ep and support different modality

* fix default arg
This commit is contained in:
freeliuzc
2026-03-26 13:52:16 +08:00
committed by GitHub
parent 61ebac49ef
commit 4fd877ed43
10 changed files with 112 additions and 19 deletions
+1
View File
@@ -622,6 +622,7 @@ class LLMEngine:
f" --routing_replay_config '{self.cfg.routing_replay_config.to_json_string()}'"
f" --model-impl {self.cfg.model_config.model_impl}"
f" --num_cpu_blocks {self.cfg.cache_config.num_cpu_blocks}"
f" --deploy_modality {self.cfg.deploy_modality.value}"
)
if self.cfg.structured_outputs_config.logits_processors is not None:
arguments += f" --logits-processors {' '.join(self.cfg.structured_outputs_config.logits_processors)}"