This website requires JavaScript.
Explore
Help
Sign In
apps
/
FastDeploy
Watch
1
Star
0
Fork
0
You've already forked FastDeploy
mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced
2026-04-23 00:17:25 +08:00
Code
Issues
Actions
19
Packages
Projects
Releases
Wiki
Activity
Files
1d3ae7c0244a054e07b0de37e30859b1e184a60a
FastDeploy
/
fastdeploy
/
model_executor
/
layers
/
moe
T
History
lizexu123
1d3ae7c024
[BugFix] fix w4afp8 tp=8 (
#5868
)
...
* fix w4afp8 tp=8 * fix
2026-01-05 18:59:02 +08:00
..
__init__.py
support w4afp8 EP inference (
#3044
)
2025-08-25 11:27:45 +08:00
ep.py
[Others] enable use PFCC deep_ep (
#5822
)
2026-01-05 02:07:01 -08:00
fused_moe_backend_base.py
RL fix (
#5503
)
2025-12-11 19:25:27 +08:00
fused_moe_cutlass_backend.py
[BugFix] fix w4afp8 tp=8 (
#5868
)
2026-01-05 18:59:02 +08:00
fused_moe_deepgemm_backend.py
[Optimization] add del to decrease peak memory in MoE prefill (
#5863
)
2026-01-05 14:01:48 +08:00
fused_moe_marlin_backend.py
[New][RL] Support Rollout Routing Replay (
#5405
)
2025-12-05 22:06:26 +08:00
fused_moe_triton_backend.py
[GraphOptimization] Wrap deep gemm and triton as python op (
#5673
)
2025-12-24 15:23:46 +08:00
fused_moe_wint2_backend.py
[New][RL] Support Rollout Routing Replay (
#5405
)
2025-12-05 22:06:26 +08:00
moe.py
support w4afp8 moe offline permute & load (
#5613
)
2025-12-22 15:12:57 +08:00
routing_indices_cache.py
[RL] R3 Support RDMA Store (
#5467
)
2025-12-16 16:50:13 +08:00
triton_moe_kernels.py
[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (
#4238
)
2025-09-24 16:39:51 +08:00