FastDeploy/custom_ops/gpu_ops/speculate_decoding at 8906e09e0f13c857f34934ae6e2d8ad1319e6153 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

History

huicongyao 2e63d88f7a [Optimization][Speculative Decoding]Fuse padding sampling params (#6765 )

* optimize speculate pre process unit test

* Add CUDA kernel for building sampling params in speculative decoding

* init infer seed in device

* format code

* add unittest & fix

* fix

* format-code

* format-code

* fix rebase

* .

* fix unitest

2026-03-12 05:05:15 -07:00

..

[Speculative Decoding] Unify Spec and non-spec branch (#6685 )

2026-03-10 23:58:44 -07:00

build_sampling_params.cu

[Optimization][Speculative Decoding]Fuse padding sampling params (#6765 )

2026-03-12 05:05:15 -07:00

ngram_match.cc

[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407 )

2026-02-28 15:07:43 +08:00

speculate_calcu_accept_ratio.cu

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_clear_accept_nums.cu

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_get_output_with_topk.cc

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_get_output.cc

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_get_seq_lens_output.cu

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_get_token_penalty_multi_scores.cu

[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407 )

2026-02-28 15:07:43 +08:00

speculate_limit_thinking_content_length.cu

[OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6493 )

2026-02-25 21:36:50 +08:00

speculate_logprob_utils.cu

[Speculative Decoding] Unify Spec and non-spec branch (#6685 )

2026-03-10 23:58:44 -07:00

speculate_msg.h

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_preprocess.cu

[Model Runner] Deprecate not_need_stop (#6356 )

2026-03-05 10:55:42 +08:00

speculate_save_output_with_topk.cc

[Optim] Robust sync status when preempted happens (#5796 )

2026-01-14 12:07:33 +08:00

speculate_save_output.cc

[Optim] Robust sync status when preempted happens (#5796 )

2026-01-14 12:07:33 +08:00

speculate_schedule_cache.cu

[Others] remove stop_nums (#6182 )

2026-01-26 12:12:47 +08:00

speculate_set_stop_value_multi_seqs.cu

[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407 )

2026-02-28 15:07:43 +08:00

speculate_set_value_by_flags_and_idx.cu

[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407 )

2026-02-28 15:07:43 +08:00

speculate_step_reschedule.cu

cuda13.0, implement changes to CCCL (#6751 )

2026-03-10 16:47:02 +08:00

speculate_step_system_cache.cu

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_step.cu

cuda13.0, implement changes to CCCL (#6751 )

2026-03-10 16:47:02 +08:00

speculate_update_input_ids_cpu.cc

init (#6642 )

2026-03-04 21:55:31 +08:00

speculate_update.cu

[XPU] Fix PD + MTP (#6495 )

2026-02-27 19:07:35 +08:00

speculate_verify.cu

[MTP] refactor MTP pre_process (#6358 )

2026-02-09 10:47:15 +08:00

top_p_candidates.cu

[MTP] refactor MTP pre_process (#6358 )

2026-02-09 10:47:15 +08:00

unified_update_model_status.cu

[Speculative Decoding] Unify Spec and non-spec branch (#6685 )

2026-03-10 23:58:44 -07:00

verify_draft_tokens.cu

[Speculative Decoding] Unify Spec and non-spec branch (#6685 )

2026-03-10 23:58:44 -07:00