FastDeploy/fastdeploy/model_executor/layers/attention at 0d1a5e70bc2ffe46492830b3e128262f63b06944 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

Ryan 0d1a5e70bc [Graph Optimization] Add full_cuda_graph to control subgraph split (#6027 )

2026-01-14 11:43:59 +08:00

..

FA3 support qwen3 (#5441 )

2025-12-09 16:16:16 +08:00

__init__.py

[XPU] move xpu_attn_backend.py to FastDeploy/fastdeploy/model_executor/layers/backends/xpu (#5878 )

2026-01-09 16:34:57 +08:00

append_attn_backend.py

[Graph Optimization] Add full_cuda_graph to control subgraph split (#6027 )

2026-01-14 11:43:59 +08:00

attention_selecter.py

…

attention.py

[Model] tp+ep support v1_loader (#5465 )

2025-12-18 14:31:54 +08:00

base_attention_backend.py

…

block_multihead_attn_backend.py

…

flash_attn_backend.py

FA3 support qwen3 (#5441 )

2025-12-09 16:16:16 +08:00

flash_mask_attn_backend.py

make flash_mask attention pybind (#5783 )

2025-12-26 14:31:35 +08:00

iluvatar_attn_backend.py

[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555 )

2025-12-18 02:14:25 -08:00

mla_attention_backend.py

MLA clean code (#5979 )

2026-01-10 21:05:00 +08:00

moba_attention_backend.py

…

native_paddle_backend.py

…

utils.py

…