FastDeploy/custom_ops/gpu_ops/speculate_decoding at 3b9d6c60d33efc09788d0bad0ea92fe94401a6ac - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 08:21:53 +08:00

Files

T

History

lonelygsh e83d45833f [Speculate Decoding] Fix step_idx semantics in limit_thinking and set_stop_value kernels (#7166 )

- speculate_limit_thinking_content_length: update current_base_step to
  step_idx+1 (step_idx now records history count before current round);
  remove incorrect step_idx decrement on accept_num truncation; mark
  step_idx param as const.
- speculate_set_stop_value_multi_seqs: fix can_stop gate to use
  step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx
  formula (remove stale -accept_num offset); use <= condition so accept_idx
  maps directly to the accepted token that ends the stop sequence; fix
  accept_tokens index (remove -1).
- Update unit tests for speculate_set_stop_value_multi_seqs kernel.

2026-04-13 20:53:42 +08:00

..

[Optimization]【Hackathon 10th Spring No.49】GPU ngram_match: BlockScan Phase 2 -optimized (#7136 )

2026-04-07 01:36:25 -07:00

build_sampling_params.cu

[Optimization][Speculative Decoding]Fuse padding sampling params (#6765 )

2026-03-12 05:05:15 -07:00

naive_update_model_status.cu

[Speculative Decoding] refactor MTP and optimize spec-decoding postprocess (#6973 )

2026-03-24 10:19:01 +08:00

ngram_match_common.cuh

[Optimization]【Hackathon 10th Spring No.49】GPU ngram_match: BlockScan Phase 2 -optimized (#7136 )

2026-04-07 01:36:25 -07:00

ngram_match.cu

[Optimization]【Hackathon 10th Spring No.49】GPU ngram_match: BlockScan Phase 2 -optimized (#7136 )

2026-04-07 01:36:25 -07:00

speculate_calcu_accept_ratio.cu

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_clear_accept_nums.cu

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_get_output_with_topk.cc

[BugFix] replace ftok with custom_ftok in get_output/save_output ops (#6822 )

2026-03-16 14:22:18 +08:00

speculate_get_output.cc

[BugFix] replace ftok with custom_ftok in get_output/save_output ops (#6822 )

2026-03-16 14:22:18 +08:00

speculate_get_seq_lens_output.cu

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_get_token_penalty_multi_scores.cu

[Feature] Support mtp overlap schedule (#7001 )

2026-04-01 14:24:26 +08:00

speculate_limit_thinking_content_length.cu

[Speculate Decoding] Fix step_idx semantics in limit_thinking and set_stop_value kernels (#7166 )

2026-04-13 20:53:42 +08:00

speculate_logprob_utils.cu

[Speculative Decoding] Unify Spec and non-spec branch (#6685 )

2026-03-10 23:58:44 -07:00

speculate_msg.h

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_preprocess.cu

[Feature] Support mtp overlap schedule (#7001 )

2026-04-01 14:24:26 +08:00

speculate_save_output_with_topk.cc

fix MTP bugs in TP and overlap (#7172 )

2026-04-03 14:19:11 +08:00

speculate_save_output.cc

fix MTP bugs in TP and overlap (#7172 )

2026-04-03 14:19:11 +08:00

speculate_schedule_cache.cu

[Feature] Support mtp overlap schedule (#7001 )

2026-04-01 14:24:26 +08:00

speculate_set_stop_value_multi_seqs.cu

[Speculate Decoding] Fix step_idx semantics in limit_thinking and set_stop_value kernels (#7166 )

2026-04-13 20:53:42 +08:00

speculate_set_value_by_flags_and_idx.cu

[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407 )

2026-02-28 15:07:43 +08:00

speculate_step_reschedule.cu

[BugFix] replace ftok with custom_ftok in get_output/save_output ops (#6822 )

2026-03-16 14:22:18 +08:00

speculate_step_system_cache.cu

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_step.cu

cuda13.0, implement changes to CCCL (#6751 )

2026-03-10 16:47:02 +08:00

speculate_update_input_ids_cpu.cc

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_update.cu

[XPU] Fix PD + MTP (#6495 )

2026-02-27 19:07:35 +08:00

speculate_verify.cu

[MTP] refactor MTP pre_process (#6358 )

2026-02-09 10:47:15 +08:00

top_p_candidates.cu

[Feature] Support mtp overlap schedule (#7001 )

2026-04-01 14:24:26 +08:00

unified_update_model_status.cu

[Speculate Decoding] Fix step_idx semantics in limit_thinking and set_stop_value kernels (#7166 )

2026-04-13 20:53:42 +08:00

verify_draft_tokens.cu

fix cuda graph capture failure in CI test (#7094 )

2026-03-31 11:05:51 +08:00