FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

freeliuzc cf7934a4b2 [Speculative Decoding] Unify Spec and non-spec branch (#6685 )

* optimize spec-inference architecture

* delete debug log

* optimize spec_method usage  && fix unit_test

* add claude unit-test skill

* fix some ugly bug

* enhance robustness and bounds check

* unify method & spec_method to method to avoid bug

* activate CI

* fix unit test

* Unify logprobs computation for naive and speculative decoding, fix CUDA kernel

* fix logprob bug && optimize verify kernel

* fix exist_decode() judge

2026-03-10 23:58:44 -07:00

graph_optimization

[Speculative Decoding] Unify Spec and non-spec branch (#6685 )

2026-03-10 23:58:44 -07:00

guided_decoding

[Feature] Guided Decoding add LLguidance backend (#5124 )

2025-12-03 20:23:57 +08:00

layers

[Speculative Decoding] Unify Spec and non-spec branch (#6685 )

2026-03-10 23:58:44 -07:00

logits_processor

[Feature] Support ThinkingBudget Logits processor to control thinking content length (#6367 )