[XPU] Unify Spec and non-spec branch.(#6947) (#7180)

* [XPU] cherry-pick PR-6947 * [XPU] use unified_update_model_status. * refactor xpu_model_runner. * refactor sampler. * fix codestyle. * Fix XPU speculative decoding: rename output tensors to cu_seqlens_q_output/batch_id_per_token_output, correct WRAPPER_CHECK_PTR types, and fix dynamic gather shape in verify_draft_tokens path. * fix codestyle. * replace output_padding_offset with is_speculative flag in gather_next_token. * rename hiddden_states. * unify cu_seqlens_q_output and batch_id_per_token_output init. --------- Co-authored-by: cmcamdy <1027740945@qq.com>
2026-04-23 08:21:53 +08:00 · 2026-04-16 14:58:38 +08:00
parent 17002edc47
commit 29495b2cf1
9 changed files with 226 additions and 149 deletions
@@ -766,7 +766,6 @@ DLL_EXPORT int speculate_limit_thinking_content_length_kernel(
    const int eos_token_id_len,
    const int inject_len,
    const bool splitwise_role_is_decode);
-
 DLL_EXPORT int verify_draft_tokens(
    api::Context* ctx,
    // Core I/O