[Feature] Support EP prefill with num_worst_tokens (#6574)

* support num worst tokens

* support num worst tokens

* fix build error

* support num worst tokens: fix errors

* support num worst tokens: fix feild

* support num worst tokens: delete requiements

* replace permute and depermute op by pure cuda

* replace permute and depermute op by pure cuda

* fix ci

* fix op

* fix nan

* fix code style

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
This commit is contained in:
RichardWooSJTU
2026-03-11 17:09:07 +08:00
committed by GitHub
parent 0466c7e8a8
commit 9f0778f991
21 changed files with 1775 additions and 166 deletions
+1
View File
@@ -643,6 +643,7 @@ class LLMEngine:
"moe_gate_fp32": self.cfg.model_config.moe_gate_fp32,
"shutdown_comm_group_if_worker_idle": self.cfg.parallel_config.shutdown_comm_group_if_worker_idle,
"enable_entropy": self.cfg.model_config.enable_entropy,
"ep_prefill_use_worst_num_tokens": self.cfg.parallel_config.ep_prefill_use_worst_num_tokens,
"enable_overlap_schedule": self.cfg.scheduler_config.enable_overlap_schedule,
}
for worker_flag, value in worker_store_true_flag.items():