[Models][OP][Optimization] Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM (#6689)

* Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM
This commit is contained in:
AIbin
2026-03-10 15:05:14 +08:00
committed by GitHub
parent 25c479312d
commit c3aceb6bdc
22 changed files with 8022 additions and 143 deletions
+2
View File
@@ -394,6 +394,8 @@ elif paddle.is_compiled_with_cuda():
)
sources += ["gpu_ops/append_attention.cu"]
sources += find_end_files("gpu_ops/append_attn", ".cu")
# sparse indexer
sources += find_end_files("gpu_ops/sparse_indexer", ".cu")
# mla
sources += ["gpu_ops/multi_head_latent_attention.cu"]
# gemm_dequant