Logo
Explore Help
Sign In
apps/FastDeploy
1
0
Fork 0
You've already forked FastDeploy
mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00
Code Issues Actions 19 Packages Projects Releases Wiki Activity
Files
7bd86f99a52b2130a40c6d2f984b5f4a731f0b57
FastDeploy/fastdeploy/model_executor/layers/moe
T
History
RichardWooSJTU 7bd86f99a5 [BugFix] Fix tbo nan (#6439)
2026-03-02 14:28:48 +08:00
..
__init__.py
support w4afp8 EP inference (#3044)
2025-08-25 11:27:45 +08:00
ep.py
fix pfcc deep ep in low latency mode (#6440)
2026-03-02 10:35:51 +08:00
fused_moe_backend_base.py
[Feature] Support redundant expert for eplb (#5918)
2026-01-09 17:13:24 +08:00
fused_moe_cutlass_backend.py
[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553)
2026-03-02 14:07:17 +08:00
fused_moe_deepgemm_backend.py
[BugFix] Fix tbo nan (#6439)
2026-03-02 14:28:48 +08:00
fused_moe_marlin_backend.py
[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457)
2026-02-26 21:08:46 -08:00
fused_moe_triton_backend.py
fix reshard error (#6536)
2026-02-27 22:22:37 +08:00
fused_moe_wint2_backend.py
[loader]supoort wint2 backend (#6139)
2026-02-08 22:42:36 -08:00
moe.py
[loader]supoort wint2 backend (#6139)
2026-02-08 22:42:36 -08:00
routing_indices_cache.py
[RL] R3 Support Fused Put the Routing of All Layers (#6099)
2026-02-03 04:13:16 -08:00
triton_moe_kernels.py
[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238)
2025-09-24 16:39:51 +08:00
Powered by Gitea Version: 1.26.0 Page: 217ms Template: 4ms
Auto
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API