FastDeploy/fastdeploy/model_executor/layers/moe at 7bd86f99a52b2130a40c6d2f984b5f4a731f0b57 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

RichardWooSJTU 7bd86f99a5 [BugFix] Fix tbo nan (#6439 )

2026-03-02 14:28:48 +08:00

..

__init__.py

support w4afp8 EP inference (#3044 )

2025-08-25 11:27:45 +08:00

ep.py

fix pfcc deep ep in low latency mode (#6440 )

2026-03-02 10:35:51 +08:00

fused_moe_backend_base.py

[Feature] Support redundant expert for eplb (#5918 )

2026-01-09 17:13:24 +08:00

fused_moe_cutlass_backend.py

[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553 )

2026-03-02 14:07:17 +08:00

fused_moe_deepgemm_backend.py

[BugFix] Fix tbo nan (#6439 )

2026-03-02 14:28:48 +08:00

fused_moe_marlin_backend.py

[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457 )

2026-02-26 21:08:46 -08:00

fused_moe_triton_backend.py

fix reshard error (#6536 )

2026-02-27 22:22:37 +08:00

fused_moe_wint2_backend.py

[loader]supoort wint2 backend (#6139 )

2026-02-08 22:42:36 -08:00

moe.py

[loader]supoort wint2 backend (#6139 )

2026-02-08 22:42:36 -08:00

routing_indices_cache.py

[RL] R3 Support Fused Put the Routing of All Layers (#6099 )

2026-02-03 04:13:16 -08:00

triton_moe_kernels.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00