FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

Longzhi Wang 2eea6fa97a [BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend (#7028 )

* [BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend

* add constexpr and code style clean

* add test

* fix code style

* fix test

2026-03-30 11:17:15 +08:00

cpu_ops

c++ code format (#4527 )

2025-10-22 17:59:50 +08:00

gpu_ops

[BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend (#7028 )

2026-03-30 11:17:15 +08:00

iluvatar_ops

[Iluvatar] Optimize decode group_gemm and Support cuda graph for ernie (#6803 )

2026-03-12 19:21:17 +08:00

metax_ops

[Model Runner] Deprecate not_need_stop (#6356 )

2026-03-05 10:55:42 +08:00

third_party

[setup optimize]Support git submodule (#4033 )

2025-09-11 17:41:16 +08:00

utils

【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 )

2026-01-15 19:18:42 +08:00

xpu_ops

[XPU] Fix speculate schedule (#7049 )

2026-03-27 18:28:17 +08:00

0001-DeepGEMM-95e81b3.patch

[OP]Remove extra H2D in DeepGemm (#5262 )

2025-11-28 14:23:44 +08:00

MANIFEST.in

[LLM] First commit the llm deployment code

2025-06-09 19:20:15 +08:00

setup_ops_cpu.py

polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00

setup_ops.py

【Hackathon 10th Spring No.45】FastDeploy 支持在 T4/V100 硬件的编译 -part (#6488 )

2026-03-23 19:16:23 +08:00