FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 08:21:53 +08:00

Files

T

AIbin cb6819d086 [Optimization][OP]support per_token_group_fp8_quant cuda kernel (#6865 )

* support per_token_group_fp8_quant cuda kernel

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* update code

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

2026-03-17 19:17:51 +08:00

ops

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

__init__.py

[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 )

2026-03-16 21:32:43 +08:00

block_wise_fp8.py

[Feature] Add deepgemm bias epilogue for SM100 (#6857 )

2026-03-16 20:12:00 +08:00

fp8_utils.py

[Optimization][OP]support per_token_group_fp8_quant cuda kernel (#6865 )