FastDeploy/fastdeploy/model_executor/layers/attention at 888c4b992dd9081881a2dfeed445f4e630788eed - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

Lucas 888c4b992d [XPU] refactor of block_attn param 'pos_emb_type' (#5511 )

2025-12-12 14:30:09 +08:00

..

FA3 support qwen3 (#5441 )

2025-12-09 16:16:16 +08:00

__init__.py

[Feature] support flash_mask_attention backend (#5134 )

2025-11-28 10:12:16 +08:00

append_attn_backend.py

[Others] add assert and only count the actual load in cuda_graph (#5445 )

2025-12-10 11:22:54 +08:00

attention_selecter.py

…

attention.py

…

base_attention_backend.py

…

block_multihead_attn_backend.py

[KVCache] support unified cache backend (#4903 )

2025-11-12 14:54:52 +08:00

flash_attn_backend.py

FA3 support qwen3 (#5441 )

2025-12-09 16:16:16 +08:00

flash_mask_attn_backend.py

FA3 support qwen3 (#5441 )

2025-12-09 16:16:16 +08:00

iluvatar_attn_backend.py

[KVCache] support unified cache backend (#4903 )

2025-11-12 14:54:52 +08:00

mla_attention_backend.py

[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251 )

2025-12-02 14:11:50 +08:00

moba_attention_backend.py

[KVCache] support unified cache backend (#4903 )

2025-11-12 14:54:52 +08:00

native_paddle_backend.py

…

utils.py

[PD Disaggregation][XPU] Add XPU support for PD disaggregation (#5113 )

2025-11-21 14:09:01 +08:00

xpu_attn_backend.py

[XPU] refactor of block_attn param 'pos_emb_type' (#5511 )

2025-12-12 14:30:09 +08:00