FastDeploy/fastdeploy/model_executor/layers at 4efd073a41770f78c0b4b07b4c5eea92294b9c8d - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-06 15:40:33 +08:00

Files

T

History

chen 4efd073a41 fix block_wise_fp8_v1_loader_moe_shape (#4384 )

2025-10-15 14:08:53 +08:00

..

[Optimization] Fuse get_max_len and get_kv_max_len (#4369 )

2025-10-13 20:35:00 +08:00

[XPU] fix ep (#4393 )

2025-10-15 11:41:05 +08:00

fix block_wise_fp8_v1_loader_moe_shape (#4384 )

2025-10-15 14:08:53 +08:00

[Feature] support qwen3-embedding model load (#4202 )

2025-09-23 00:14:35 -07:00

[XPU] Support W4A8C8-TP4-300B Model (#4068 )

2025-10-10 15:41:32 +08:00

[Executor]CUDAGraph support Speculate Decode (#3769 )

2025-10-09 21:18:29 +08:00

__init__.py

…

activation.py

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

embeddings.py

[BugFix] fix qwen3-embedding model tp>1 (#4223 )

2025-09-24 14:13:26 +08:00

linear.py

fix machete pre quant (#4295 )

2025-09-28 16:11:09 +08:00

lm_head.py

[Feature] support qwen3-embedding model load (#4202 )

2025-09-23 00:14:35 -07:00

mtp_linear.py

support tmp (#3675 )

2025-08-28 19:42:32 +08:00

normalization.py

adaptive rms_norm's dtype (#3617 )

2025-08-26 15:29:15 +08:00

pooler.py

[Feature] support pool (#3827 )

2025-09-22 14:09:09 +08:00

rotary_embedding.py

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

utils.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00