sunxin
|
c29e86fc9d
|
[Feature] Support mtp overlap schedule (#7001)
|
2026-04-01 14:24:26 +08:00 |
|
Nyakku Shigure
|
8b6bbb3504
|
[Optimization] Use a separate driver when using Triton with Paddle (#6897)
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2026-03-24 10:56:00 +08:00 |
|
freeliuzc
|
cf7934a4b2
|
[Speculative Decoding] Unify Spec and non-spec branch (#6685)
* optimize spec-inference architecture
* delete debug log
* optimize spec_method usage && fix unit_test
* add claude unit-test skill
* fix some ugly bug
* enhance robustness and bounds check
* unify method & spec_method to method to avoid bug
* activate CI
* fix unit test
* Unify logprobs computation for naive and speculative decoding, fix CUDA kernel
* fix logprob bug && optimize verify kernel
* fix exist_decode() judge
|
2026-03-10 23:58:44 -07:00 |
|
chen
|
193886e745
|
only cuda run triton op (#5846)
|
2025-12-31 14:17:31 +08:00 |
|
chen
|
0bcf924e10
|
[Optimization] Optimization for gather_logprob by 10GB (#5817)
* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k
|
2025-12-30 15:33:34 +08:00 |
|