This website requires JavaScript.
Explore
Help
Sign In
apps
/
FastDeploy
Watch
1
Star
0
Fork
0
You've already forked FastDeploy
mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced
2026-04-24 01:29:57 +08:00
Code
Issues
Actions
9
Packages
Projects
Releases
Wiki
Activity
Files
a6351dea0b30fee8e9c4e663b14e445c29020eb4
FastDeploy
/
fastdeploy
/
model_executor
/
layers
/
moe
T
History
fxyfxy777
4d39232553
[BugFix] add ut for fused_moe_degemm (
#6840
)
...
* add ut * add skip
2026-03-16 12:22:18 +08:00
..
__init__.py
support w4afp8 EP inference (
#3044
)
2025-08-25 11:27:45 +08:00
ep.py
[Feature] Support EP prefill with num_worst_tokens (
#6574
)
2026-03-11 17:09:07 +08:00
fused_moe_backend_base.py
[Feature] Support EP prefill with num_worst_tokens (
#6574
)
2026-03-11 17:09:07 +08:00
fused_moe_cutlass_backend.py
[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (
#6553
)
2026-03-02 14:07:17 +08:00
fused_moe_deepgemm_backend.py
[BugFix] add ut for fused_moe_degemm (
#6840
)
2026-03-16 12:22:18 +08:00
fused_moe_marlin_backend.py
[Optimization] Enable BF16 gate computation for GLM and Qwen (
#6457
)
2026-02-26 21:08:46 -08:00
fused_moe_triton_backend.py
[Feature] Support EP prefill with num_worst_tokens (
#6574
)
2026-03-11 17:09:07 +08:00
fused_moe_wint2_backend.py
[loader]supoort wint2 backend (
#6139
)
2026-02-08 22:42:36 -08:00
moe.py
[loader]supoort wint2 backend (
#6139
)
2026-02-08 22:42:36 -08:00
routing_indices_cache.py
[RL] add stream guard (
#6814
)
2026-03-13 11:22:26 +08:00
triton_moe_kernels.py
[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (
#4238
)
2025-09-24 16:39:51 +08:00