FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 17:11:21 +08:00

Author	SHA1	Message	Date
sunxin	c29e86fc9d	[Feature] Support mtp overlap schedule (#7001 )	2026-04-01 14:24:26 +08:00
Nyakku Shigure	8b6bbb3504	[Optimization] Use a separate driver when using Triton with Paddle (#6897 ) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-24 10:56:00 +08:00
freeliuzc	cf7934a4b2	[Speculative Decoding] Unify Spec and non-spec branch (#6685 ) * optimize spec-inference architecture * delete debug log * optimize spec_method usage && fix unit_test * add claude unit-test skill * fix some ugly bug * enhance robustness and bounds check * unify method & spec_method to method to avoid bug * activate CI * fix unit test * Unify logprobs computation for naive and speculative decoding, fix CUDA kernel * fix logprob bug && optimize verify kernel * fix exist_decode() judge	2026-03-10 23:58:44 -07:00
chen	193886e745	only cuda run triton op (#5846 )	2025-12-31 14:17:31 +08:00
chen	0bcf924e10	[Optimization] Optimization for gather_logprob by 10GB (#5817 ) * opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k	2025-12-30 15:33:34 +08:00