Files
FastDeploy/custom_ops/gpu_ops
freeliuzc cf7934a4b2 [Speculative Decoding] Unify Spec and non-spec branch (#6685)
* optimize spec-inference architecture

* delete debug log

* optimize spec_method usage  && fix unit_test

* add claude unit-test skill

* fix some ugly bug

* enhance robustness and bounds check

* unify method & spec_method to method to avoid bug

* activate CI

* fix unit test

* Unify logprobs computation for naive and speculative decoding, fix CUDA kernel

* fix logprob bug && optimize verify kernel

* fix exist_decode() judge
2026-03-10 23:58:44 -07:00
..
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-01-20 21:46:21 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2025-12-24 11:28:47 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-02-10 14:58:50 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00
2026-03-04 21:55:31 +08:00