FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

fmiao2372 404cf0ece4 [Intel HPU] enable tensor_wise_fp8 (#5324 )

* [Intel HPU] enable tensor_wise_fp8

* update code based on comments

* fix code style issue

* fix bug about RP 5138

* mv kv_cache modifications to HPU backend

* fix FP8 Precision Issues

* fix FP8 Precision Issues

* Add quantization UT

---------

Co-authored-by: yanfeich <yanfei.cheng@intel.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

2025-12-17 16:45:03 +08:00

__init__.py

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

base.py

[Feature] support flash_mask_attention backend (#5134 )

2025-11-28 10:12:16 +08:00

cpu.py

[LLM] First commit the llm deployment code

2025-06-09 19:20:15 +08:00

cuda.py

[Feature] support flash_mask_attention backend (#5134 )