FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-07 16:08:58 +08:00

Files

T

bukejiyu bcaa98ff9c V1 loader default (#4251 )

* v1 laoder

* update

* update

2025-10-15 16:49:17 +08:00

attention

[Optimization] Fuse get_max_len and get_kv_max_len (#4369 )

2025-10-13 20:35:00 +08:00

backends

[XPU] fix ep (#4393 )

2025-10-15 11:41:05 +08:00

moe

V1 loader default (#4251 )

2025-10-15 16:49:17 +08:00

pool

[Feature] support qwen3-embedding model load (#4202 )

2025-09-23 00:14:35 -07:00

quantization

[XPU] Support W4A8C8-TP4-300B Model (#4068 )

2025-10-10 15:41:32 +08:00

sample

[Executor]CUDAGraph support Speculate Decode (#3769 )

2025-10-09 21:18:29 +08:00

__init__.py

[LLM] First commit the llm deployment code

2025-06-09 19:20:15 +08:00

activation.py

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

embeddings.py

[BugFix] fix qwen3-embedding model tp>1 (#4223 )

2025-09-24 14:13:26 +08:00

linear.py

fix machete pre quant (#4295 )

2025-09-28 16:11:09 +08:00

lm_head.py

[Feature] support qwen3-embedding model load (#4202 )

2025-09-23 00:14:35 -07:00

mtp_linear.py

support tmp (#3675 )

2025-08-28 19:42:32 +08:00

normalization.py

adaptive rms_norm's dtype (#3617 )

2025-08-26 15:29:15 +08:00

pooler.py

[Feature] support pool (#3827 )

2025-09-22 14:09:09 +08:00

rotary_embedding.py

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

utils.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00