mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
[Feature] Support EP prefill with num_worst_tokens (#6574)
* support num worst tokens * support num worst tokens * fix build error * support num worst tokens: fix errors * support num worst tokens: fix feild * support num worst tokens: delete requiements * replace permute and depermute op by pure cuda * replace permute and depermute op by pure cuda * fix ci * fix op * fix nan * fix code style --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
This commit is contained in:
@@ -643,6 +643,7 @@ class LLMEngine:
|
||||
"moe_gate_fp32": self.cfg.model_config.moe_gate_fp32,
|
||||
"shutdown_comm_group_if_worker_idle": self.cfg.parallel_config.shutdown_comm_group_if_worker_idle,
|
||||
"enable_entropy": self.cfg.model_config.enable_entropy,
|
||||
"ep_prefill_use_worst_num_tokens": self.cfg.parallel_config.ep_prefill_use_worst_num_tokens,
|
||||
"enable_overlap_schedule": self.cfg.scheduler_config.enable_overlap_schedule,
|
||||
}
|
||||
for worker_flag, value in worker_store_true_flag.items():
|
||||
|
||||
Reference in New Issue
Block a user