FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

AIbin cb6819d086 [Optimization][OP]support per_token_group_fp8_quant cuda kernel (#6865 )

* support per_token_group_fp8_quant cuda kernel

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* update code

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

2026-03-17 19:17:51 +08:00

exception.h

[Models][OP][Optimization] Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM (#6689 )

2026-03-10 15:05:14 +08:00

indexer_topk.cu

[Optimization] Update Deepseekv3.2 model and dsa-indexer networking and add some unitest (#6762 )

2026-03-11 15:52:54 +08:00

indexer_topk.cuh

[Optimization][BugFix]Optimize Deepseek networking code (#6861 )

2026-03-16 16:52:43 +08:00

per_token_group_quant.cu

[Optimization][OP]support per_token_group_fp8_quant cuda kernel (#6865 )