FastDeploy/fastdeploy/model_executor/layers/attention at 3cc09418f1574369442d49292d41925887acd1c7 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

周周周 3cc09418f1 support dsv3 use flashmla (#6593 )

2026-03-03 11:09:43 +08:00

..

seq_lens related tensor shape -> [max_num_seqs] (#6535 )

2026-03-02 11:18:30 +08:00

__init__.py

[XPU] move xpu_attn_backend.py to FastDeploy/fastdeploy/model_executor/layers/backends/xpu (#5878 )

2026-01-09 16:34:57 +08:00

append_attn_backend.py

[Feature]Supports SWA based on appendattn (#6547 )

2026-03-01 19:02:08 +08:00

attention_selecter.py

…

attention.py

Support Norm before Rope (#6332 )

2026-02-05 15:28:52 +08:00

base_attention_backend.py

…

block_multihead_attn_backend.py

[Feature]Support reorder ids to split prefill and decodes (#5779 )

2026-02-03 00:28:02 -08:00

flash_attn_backend.py

[BugFix] lazy enable_torch_proxy for cutlass (#6523 )

2026-03-02 10:43:58 +08:00

flash_mask_attn_backend.py

[MTP] refactor MTP pre_process (#6358 )

2026-02-09 10:47:15 +08:00

iluvatar_attn_backend.py

[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553 )

2026-03-02 14:07:17 +08:00

mla_attention_backend.py

support dsv3 use flashmla (#6593 )

2026-03-03 11:09:43 +08:00

moba_attention_backend.py

[Feature]Support reorder ids to split prefill and decodes (#5779 )

2026-02-03 00:28:02 -08:00

native_paddle_backend.py

…

utils.py

…