FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

SUN Dong 6cff780fdb [RL] Support moe_topk_select using Paddle native operators and Add fused stack-transpose-quant for BlockWiseFP8 MoE weight quantization and swiglu-fp8-quant op for DeepGemmFusedMoE for training alignment (#6850 )

* [RL] Add fused stack-transpose-quant for BlockWiseFP8 MoE weight quantization

* update

* update

* update

* support custom topk inDeepGemmFusedMoeMethod  apply_tp

* apply_ep_prefill support moe_topk_select

* update

* add ut

* add ut

* add ut

* modity doc

* fix env and docs

* add ut

---------

Co-authored-by: zhanghonggeng <zhanghonggeng@baidu.com>

2026-03-24 11:12:39 +08:00

ops

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

__init__.py

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

block_wise_fp8.py

[Feature] Add deepgemm bias epilogue for SM100 (#6857 )

2026-03-16 20:12:00 +08:00

fp8_utils.py

[RL] Support moe_topk_select using Paddle native operators and Add fused stack-transpose-quant for BlockWiseFP8 MoE weight quantization and swiglu-fp8-quant op for DeepGemmFusedMoE for training alignment (#6850 )