FastDeploy/fastdeploy/model_executor at 20de04e249d94f846c1f81d220aa2eab5b27e4ce - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

zccjjj 20de04e249 [XPU] move xpu_attn_backend.py to FastDeploy/fastdeploy/model_executor/layers/backends/xpu (#5878 )

2026-01-09 16:34:57 +08:00

..

graph_optimization

[Others] add assert and only count the actual load in cuda_graph (#5445 )

2025-12-10 11:22:54 +08:00

guided_decoding

…

[XPU] move xpu_attn_backend.py to FastDeploy/fastdeploy/model_executor/layers/backends/xpu (#5878 )

2026-01-09 16:34:57 +08:00

logits_processor

…

[Loader]Fix bug in MTP weight loading (#5744 )

2025-12-25 11:32:17 +08:00

Revert "Revert "[TSP] last_norm allgather move to model.py (#5924 )" (#5961 )" (#5972 )

2026-01-09 15:58:22 +08:00

[INTEL HPU] support only one release package of PaddleCustomDevice (#5910 )

2026-01-08 11:57:13 +08:00

__init__.py

…

entropy_utils.py

[Bugfix] Fix entropy calculation bugs (#5941 )

2026-01-08 20:57:35 +08:00

forward_meta.py

[Intel HPU] enable chunked prefill (#5903 )

2026-01-06 21:01:50 +08:00

load_weight_utils.py

[V1 Loader] Support loading static C8 scale JSON (#5909 )

2026-01-06 19:49:30 -08:00

pre_and_post_process.py

[Metax] optimize flash attention backend (#5876 )

2026-01-06 09:52:09 +08:00

utils.py

[INTEL_HPU] supported ERNIE-4.5-21B-A3B-Thinking (#5891 )

2026-01-07 21:31:53 +08:00

xpu_pre_and_post_process.py

[XPU] Speculative Decoding with PD (#5856 )

2026-01-05 17:31:03 +08:00