sunxin
c29e86fc9d
[Feature] Support mtp overlap schedule ( #7001 )
2026-04-01 14:24:26 +08:00
freeliuzc
7a6c28781b
[Speculative Decoding] Optimize attn_mask_offset and fix mtp bug ( #7005 )
...
* optimize attn_mask_offset and optimize mtp usage
* delete useless branch
* fix kernel format
* fix kernel runner
2026-03-25 01:52:06 -07:00
freeliuzc
e87ce4b8cd
[Speculative Decoding] refactor MTP and optimize spec-decoding postprocess ( #6973 )
...
* support new mtp
* refactor(speculate_decoding and mtp): optimize mtp sturcture logic. Update spec-branch status-process
* fix cuda-graph for spec-decoding
* fix xpu mtp and fix some note
* fix unittest and optmize note
* fix model status update in eos-branch
2026-03-24 10:19:01 +08:00
freeliuzc
cf7934a4b2
[Speculative Decoding] Unify Spec and non-spec branch ( #6685 )
...
* optimize spec-inference architecture
* delete debug log
* optimize spec_method usage && fix unit_test
* add claude unit-test skill
* fix some ugly bug
* enhance robustness and bounds check
* unify method & spec_method to method to avoid bug
* activate CI
* fix unit test
* Unify logprobs computation for naive and speculative decoding, fix CUDA kernel
* fix logprob bug && optimize verify kernel
* fix exist_decode() judge
2026-03-10 23:58:44 -07:00