FastDeploy/fastdeploy/model_executor/layers/moe at 12d4b4cb87ad97d7136c9fe0898d75f12e3c58ed - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-06 15:40:33 +08:00

Files

T

History

周周周 cbdb2462ea cp 1131 tbo to develop (#6281 )

2026-02-03 15:23:23 +08:00

..

__init__.py

support w4afp8 EP inference (#3044 )

2025-08-25 11:27:45 +08:00

ep.py

cp 1131 tbo to develop (#6281 )

2026-02-03 15:23:23 +08:00

fused_moe_backend_base.py

[Feature] Support redundant expert for eplb (#5918 )

2026-01-09 17:13:24 +08:00

fused_moe_cutlass_backend.py

[Iluvartar][CI] Fix the error max_tokens_per_expert referenced before assignment (#6083 )

2026-01-21 16:01:29 +08:00

fused_moe_deepgemm_backend.py

cp 1131 tbo to develop (#6281 )

2026-02-03 15:23:23 +08:00

fused_moe_marlin_backend.py

[New][RL] Support Rollout Routing Replay (#5405 )

2025-12-05 22:06:26 +08:00

fused_moe_triton_backend.py

Revert "[Feature] Support Ernie FP8 on sm100 (#5593 )" (#6275 )

2026-01-30 11:22:01 +08:00

fused_moe_wint2_backend.py

[BugFix] fix wint2 (#6109 )

2026-01-20 21:46:21 +08:00

moe.py

[Feature] Support NVFP4 MoE on SM100 (#6003 )

2026-01-29 14:16:07 +08:00

routing_indices_cache.py

[UT] Add GLM E2E tests for non-MTP and MTP (#6163 )

2026-01-23 10:34:29 +08:00

triton_moe_kernels.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00