Files
FastDeploy/custom_ops/gpu_ops/moe
lizexu123 6619298b50 【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007)
* update w4afp8

* build.sh ok

* support cuda_graph

* fix

* add test

* fix max_tokens_per_expert

* >=70

* fix

* compute_max_tokens_from_prefix_sum in w4afp8

* compute_max_tokens use cub
2026-01-15 19:18:42 +08:00
..
2025-12-30 21:16:13 +08:00
2025-12-03 14:06:01 +08:00
2025-10-20 14:44:58 +08:00
2025-10-20 14:44:58 +08:00
2025-09-03 10:54:34 +08:00