[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)

* fix attn_mask_offset in mtp with multi-step and pd-split-mode

* fix xpu operater register

* update pmtp multi-step mtp strategy in d-split -mode

* add note

* fix xpu register
This commit is contained in:
freeliuzc
2025-12-25 17:54:59 +08:00
committed by GitHub
parent 7247dc5f3a
commit 9018ccf74e
6 changed files with 227 additions and 172 deletions
+2
View File
@@ -887,6 +887,8 @@ void DraftModelPreprocess(const paddle::Tensor& draft_tokens,
const paddle::Tensor& is_block_step,
const paddle::Tensor& batch_drop,
const paddle::Tensor& pre_ids,
const paddle::Tensor& mask_rollback,
const paddle::Tensor& recompute_token_num,
const paddle::Tensor& accept_tokens,
const paddle::Tensor& accept_num,
const paddle::Tensor& base_model_seq_lens_this_time,