FastDeploy

apps/FastDeploy

Fork 0

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

zhangbo9674 5c60e2fc6f fix bug in cudagraph (#7069 )

2026-03-30 14:24:23 +08:00

graph_optimization

[Speculative Decoding] Unify Spec and non-spec branch (#6685 )

2026-03-10 23:58:44 -07:00

guided_decoding

[Optimization] Use a separate driver when using Triton with Paddle (#6897 )

2026-03-24 10:56:00 +08:00

layers

fix bug in cudagraph (#7069 )

2026-03-30 14:24:23 +08:00

logits_processor

[Bugfix] Align thinking_budget behavior with ERNIE reasoning flow (#6934 )

2026-03-23 14:15:55 +08:00

model_loader

add reconstruct (#6675 )

2026-03-10 11:25:37 +08:00

models

[Optimization] Use a separate driver when using Triton with Paddle (#6897 )

2026-03-24 10:56:00 +08:00

ops

[Optimization] Use a separate driver when using Triton with Paddle (#6897 )

2026-03-24 10:56:00 +08:00

__init__.py

…

entropy_utils.py

[Bugfix] Fix entropy calculation bugs (#5941 )

2026-01-08 20:57:35 +08:00

forward_meta.py

[Other] Adjust GPUModelRunner to enhance compatibility (#6851 )

2026-03-16 14:49:19 +08:00

load_weight_utils.py

add reconstruct (#6675 )

2026-03-10 11:25:37 +08:00

pre_and_post_process.py

[Speculative Decoding] Optimize attn_mask_offset and fix mtp bug (#7005 )

2026-03-25 01:52:06 -07:00

utils.py

[Feature] Support NVFP4 Flashinfer-cutedsl MoE on SM100 (#6963 )

2026-03-30 11:37:04 +08:00

xpu_pre_and_post_process.py

[XPU] rm stop nums (#6651 )

2026-03-12 14:05:58 +08:00