FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 17:11:21 +08:00

Files

T

lizexu123 6619298b50 【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 )

* update w4afp8

* build.sh ok

* support cuda_graph

* fix

* add test

* fix max_tokens_per_expert

* >=70

* fix

* compute_max_tokens_from_prefix_sum in w4afp8

* compute_max_tokens use cub

2026-01-15 19:18:42 +08:00

moe_wna16_marlin_utils

【Inference Optimize】Support automatic generation of marlin kernel (#3149 )

2025-08-01 22:43:18 +08:00

deepgemm_preprocess.cu

Revert cuda check (#5915 )

2026-01-07 14:40:18 +08:00

ep_moe_expert_dispatch.cu

[Feature] Support redundant expert for eplb (#5918 )

2026-01-09 17:13:24 +08:00

fused_moe_helper.h

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

fused_moe_imp_op.h

[BugFix] Fix zero workspace returned by CUB size query under CUDA Graph in MoE dispatch (#5087 )

2025-11-20 20:00:29 +08:00

fused_moe_op.h

w4afp8 fix quant (#5830 )

2025-12-30 21:16:13 +08:00

fused_moe.cu

…

gptq_marlin_repack.cu

…

group_swiglu_with_masked.cu

…

group_swiglu_with_masked.h

…

moe_deepgemm_depermute.cu

…

moe_deepgemm_permute.cu

…

moe_dispatch.cu

【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 )

2026-01-15 19:18:42 +08:00

moe_expert_ffn_wint2.cu

Revert "【New Feature】W4afp8 supports per group quantization (#4272 )" (#4854 )

2025-11-06 17:48:28 +08:00

moe_fast_hardamard_impl_common.h

【Hackathon 9th No.86】autogen MoeFastHardamardImplWrapper template_instantiation (#4592 )

2025-10-30 10:28:36 +08:00

moe_fast_hardamard_impl.cuh

【Hackathon 9th No.86】autogen MoeFastHardamardImplWrapper template_instantiation (#4592 )

2025-10-30 10:28:36 +08:00

moe_fast_hardamard_kernel.cu

[BugFix] fix w4afp8 tp=8 (#5868 )

2026-01-05 18:59:02 +08:00

moe_fast_hardamard_kernel.h

【Hackathon 9th No.86】autogen MoeFastHardamardImplWrapper template_instantiation (#4592 )

2025-10-30 10:28:36 +08:00

moe_ffn.cu

【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 )

2026-01-15 19:18:42 +08:00

moe_reduce.cu

[Feat] ernie4_5_vl_moe support CudaGraph (#3226 )

2025-09-10 13:11:57 +08:00

moe_redundant_topk_select.cu

topk_gating_softmax support bias (#3405 )

2025-08-15 11:57:45 +08:00

moe_topk_select.cu

fix cutlass ep (#5337 )

2025-12-03 14:06:01 +08:00

moe_wna16_marlin_gemm.cu

…

moe_wna16_marlin_gemm.h

…

swigluoai.cu

Support GPT-OSS-BF16 (#4240 )

2025-10-20 14:44:58 +08:00

swigluoai.h

Support GPT-OSS-BF16 (#4240 )

2025-10-20 14:44:58 +08:00

template_config.json

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

tritonmoe_preprocess.cu

Support GPT-OSS-BF16 (#4240 )

2025-10-20 14:44:58 +08:00

winx_unzip.cu

rename fused_get_rope.cu (#3752 )

2025-09-03 10:54:34 +08:00