Jiajun Ji
29495b2cf1
[XPU] Unify Spec and non-spec branch.( #6947 ) ( #7180 )
...
* [XPU] cherry-pick PR-6947
* [XPU] use unified_update_model_status.
* refactor xpu_model_runner.
* refactor sampler.
* fix codestyle.
* Fix XPU speculative decoding: rename output tensors to cu_seqlens_q_output/batch_id_per_token_output, correct
WRAPPER_CHECK_PTR types, and fix dynamic gather shape in verify_draft_tokens path.
* fix codestyle.
* replace output_padding_offset with is_speculative flag in gather_next_token.
* rename hiddden_states.
* unify cu_seqlens_q_output and batch_id_per_token_output init.
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
2026-04-16 14:58:38 +08:00
cmcamdy
13b9fe7299
[XPU] add verify draft tokens ( #6947 )
...
* [XPU] add verify draft tokens
* fix test
* fix code style
* use sync cpy
* fix code style
* fix kernel check
* fix ramdom seed
* fix test
* fix check
* fix eos set
* fix verify
* fix verify
2026-04-15 10:18:33 +08:00
Jiajun Ji
cb03958b52
[XPU] Refactor get_padding_offset to single kernel. ( #7029 )
...
* [XPU] Refactor get_padding_offset to single kernel.
* add unittest.
* fix codestyle.
* remove cum_offsets_now.
* remove max_len.
2026-04-13 11:04:50 +08:00
cmcamdy
7a2e33098f
[XPU] Refactor pre process ( #6993 )
...
* [XPU] support speculate_pre_process
* merge develop
* fix codestype
* fix mtp, support cu_seqlens_q_output
* fix mtp, support cu_seqlens_q_output
* fix test
---------
Co-authored-by: lizan1999 <lizan03@baidu.com >
2026-04-01 20:29:55 +08:00
mayang002
72ff7bf4cd
[XPU] Fix wrapper files ( #6830 )
...
- Add WRAPPER_CHECK_PTR for pointer validity checks
- Add WRAPPER_ASSERT_GT/GE/LE for parameter range validation
- Simplify wrapper function calls to direct return pattern
2026-03-16 14:39:40 +08:00
mayang002
1f9f889e37
[XPU] refactor: XPU plugin namespace migration ( #6799 )
...
* [XPU] refactor: XPU plugin namespace migration
- Migrate wrapper layer namespace from baidu::xpu::api::plugin to fastdeploy::plugin
- Migrate kernel layer namespace from xpu3::plugin to fd_xpu3
- Add api:: prefix for types (Context, SUCCESS, XPUIndexType, ctx_guard)
- Remove XPU2 support, keep only XPU3
- Update ops/ directory to use new namespace
Total: 137 files changed
* [XPU] fix: add return value check and correct error messages
- Add PADDLE_ENFORCE_XDNN_SUCCESS check for speculate_get_logits and update_attn_mask_offsets
- Fix empty error message in draft_model_postprocess
- Correct function name in speculate_schedule_cache error message
- Update error messages from 'xpu::plugin::' to 'fastdeploy::plugin::'
2026-03-13 10:21:51 +08:00
cmcamdy
3543088d3e
[XPU] rm stop nums ( #6651 )
...
* rm stop nums
* fix conflict
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-03-12 14:05:58 +08:00
Jiajun Ji
88c4fbf8e1
[XPU] Add speculate_limit_thinking_content_length Op. ( #6627 )
...
* [XPU] Add speculate_limit_thinking_content_length OP for xpu.
* add unittest.
* format codes.
* format codes.
* format codes.
* Fix unused kernel launch return value.
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
2026-03-11 17:30:17 +08:00
lizan1999
c637692427
[XPU] support MTP Step > 1 ( #6609 )
...
Co-authored-by: lizan1999 <lizan03@baidu.com >
2026-03-04 10:07:37 +08:00
Jiajun Ji
4ff3f4212f
[XPU] Add update_attn_mask_offsets op for xpu. ( #6556 )
...
* add update_attn_mask_offsets op for xpu.
* format code style.
* format codes with pre-commit.
2026-03-03 18:00:05 +08:00
cmcamdy
13447279aa
[XPU] Fix PD + MTP ( #6495 )
...
* fix pd + mtp
* fix code style
* fix PD + MTP, D get P's first token
* add anno for gpu(speculate_update)
* update draft insertv1
* fix wapper & kernel
* fix wapper
* fix code stype
2026-02-27 19:07:35 +08:00
lizan1999
b3a48529ab
[XPU] add more type for recover batch sequence ( #6142 )
2026-01-23 15:16:05 +08:00
cmcamdy
59d8ae0a25
[XPU] Speculate Decoding + PD, benchmark fix ( #6036 )
...
* fix mtp pd
* fix kernel
* fix code style
* fix kernel
* fix test / clear debug code
* fix test / clear debug code
* fix codestyle
* fix codestyle
* fix codestyle
2026-01-15 19:19:03 +08:00
RuohengMa
2c3c983b96
[XPU] modify speculate_verify ( #5522 )
2025-12-23 14:50:30 +08:00
RuohengMa
12c76f8137
[XPU] add speculate_get_logits ( #5497 )
...
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_get_logits
* delete context
* add ptr check
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-12 15:38:30 +08:00
RuohengMa
8178e3fc6a
[XPU] add speculate_step_system_cache ( #5397 )
...
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_step_system_cache
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
2025-12-09 14:40:11 +08:00
cmcamdy
5a67a6d960
[XPU] support kernel for mtp(base) ( #4748 )
...
* [XPU] support kernel for mtp(base)
* [XPU] support kernel for mtp(base)
* format
* format
* format
* fix gather next token
* fix step && add test
* fix
* mv pre/post process
* add adjust batch / gather next token for mtp
* fix code style
* fix mtp kenrel name
* fix mtp kernel test
* mv xpu pre/post process
* mv xpu pre/post process
2025-11-27 15:05:44 +08:00
ddchenhao66
bffa08b74b
[XPU] fix thinking bug where output only contains reasoning_content ( #4761 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-11-04 14:32:35 +08:00
ddchenhao66
5443b2cffb
[XPU] xpu support think length limit ( #4539 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [XPU] xpu support think length limit
* [XPU] xpu c++ code files format
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-23 15:58:11 +08:00
zhupengyang
3a6883ac1a
c++ code format ( #4527 )
2025-10-22 17:59:50 +08:00
Lucas
87179cb744
[XPU] support XPU VL model inference ( #4030 )
...
* [XPU] support XPU VL model inference
* fix image op import and device check
* rebase develop
* fix perf
2025-09-25 14:34:15 +08:00
lengxia
137e539456
[Feature][XPU] add custom kernels for mtp ( #3537 )
2025-08-25 10:14:17 +08:00
yinwei
f2a528f9ae
[XPU] Support kvblock centralized management ( #3017 )
2025-07-29 10:40:55 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00