[Executor]CUDAGraph support Speculate Decode (#3769) · aa27b03bc0 - FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

[Executor]CUDAGraph support Speculate Decode (#3769)

CE Compile Job / ce_job_pre_check (push) Has been cancelled

Details

CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled

Details

CE Compile Job / FD-Clone-Linux (push) Has been cancelled

Details

CE Compile Job / Show Code Archive Output (push) Has been cancelled

Details

CE Compile Job / BUILD_SM8090 (push) Has been cancelled

Details

CE Compile Job / BUILD_SM8689 (push) Has been cancelled

Details

CE Compile Job / CE_UPLOAD (push) Has been cancelled

Details

Deploy GitHub Pages / deploy (push) Has been cancelled

Details

* success run ngram

* Revert "[Code Simplification] remove cum_offsets (#3410)"

This reverts commit 32b39620bc.

* success run ngram5 tp4 42bs

* success run ngram5 tp4 42bs

* mtp draft commit

* add decorator for target model

* enable draft model in cudagraph v0.5

* revert revrt cum_offset

* enable target model in cudagraph v0.9 And clean debug code

* Revert "success run ngram"

This reverts commit 8351e83993.

* add reverted code

* enable target model in cudagraph v0.9

* solve comment

* fix bid < 0

* Enable Target Model Padding And Draft Model in cudagraph

* solve problem

* delete rebuild padding debug note

* fast compile

* Add capture list for mtp

* success run 256 tp1 mtp

* Enable Lite TP2 Bsz256

* realy enable tp2 bsz 256

* fix problem

* Solve problem for Draft model in cudagraph

* Solve comment

* replace emptytensor as zeros

* Solve comments

* Revert "fast compile"

This reverts commit 834639a7ff.

* fix bug

* fix merge bug

* fix typo

* fix bug

---------

Co-authored-by: lizexu <2694294196@qq.com>
Co-authored-by: littledgg <1658565283@qq.com>
Co-authored-by: zeroRains <linjunlu@zerorains.top>
Co-authored-by: gongshaotian <gstain5555@outlook.com>

This commit is contained in:

RAM

2025-10-09 21:18:29 +08:00

committed by

GitHub

parent 7b1689f437

commit aa27b03bc0

19 changed files with 250 additions and 139 deletions

custom_ops/gpu_ops/rebuild_padding.cu

-1

View File

@@ -130,7 +130,6 @@ std::vector<paddle::Tensor> rebuild_padding(
     int pack_num = elem_nums / PackSize;
     const int blocksize = 128;
     const int grid_size = (pack_num + blocksize - 1) / blocksize;
     if (output_padding_offset) {
         RebuildAppendPaddingKernel<DataType_, PackSize>
             <<<grid_size, blocksize, 0, cu_stream>>>(