RuohengMa
cf5bc5e510
[XPU] fix bug and teporary fix for rope 3d ( #7465 )
2026-04-20 09:51:27 +08:00
Jiajun Ji
29495b2cf1
[XPU] Unify Spec and non-spec branch.( #6947 ) ( #7180 )
...
* [XPU] cherry-pick PR-6947
* [XPU] use unified_update_model_status.
* refactor xpu_model_runner.
* refactor sampler.
* fix codestyle.
* Fix XPU speculative decoding: rename output tensors to cu_seqlens_q_output/batch_id_per_token_output, correct
WRAPPER_CHECK_PTR types, and fix dynamic gather shape in verify_draft_tokens path.
* fix codestyle.
* replace output_padding_offset with is_speculative flag in gather_next_token.
* rename hiddden_states.
* unify cu_seqlens_q_output and batch_id_per_token_output init.
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
2026-04-16 14:58:38 +08:00
RuohengMa
de0c5e68fb
[XPU] Split the block_attn operator into smaller operators ( #6798 )
...
* spliced block_attn
* adapt to latest vllm
* fix unit tests
* delete mtp+cudagraph 4 cards test
* fix vl model
* fix mtp
* fix slot mapping
2026-04-16 14:28:40 +08:00
cmcamdy
13b9fe7299
[XPU] add verify draft tokens ( #6947 )
...
* [XPU] add verify draft tokens
* fix test
* fix code style
* use sync cpy
* fix code style
* fix kernel check
* fix ramdom seed
* fix test
* fix check
* fix eos set
* fix verify
* fix verify
2026-04-15 10:18:33 +08:00
Echo-Nie
8819a039c9
[Others] Fix typo ( #7280 )
...
* typo
* typo
* typo
* typo
2026-04-14 17:28:22 +08:00
zhupengyang
27b00cf385
[XPU] glm-4.5-air ( #7071 )
2026-04-14 11:31:49 +08:00
Jiajun Ji
cb03958b52
[XPU] Refactor get_padding_offset to single kernel. ( #7029 )
...
* [XPU] Refactor get_padding_offset to single kernel.
* add unittest.
* fix codestyle.
* remove cum_offsets_now.
* remove max_len.
2026-04-13 11:04:50 +08:00
Jiaxin Sui
6e5de2fd6d
[XPU][CI]Update xtdk version in download_dependencies.sh ( #7320 )
2026-04-11 00:26:48 +08:00
Jiaxin Sui
80d5d9fd32
[XPU][CI] lock xvllm version for fix bug ( #7264 )
...
* Remove duplicate NICs from environment variables
* Update version for xvllm in download_dependencies.sh
2026-04-09 12:44:27 +08:00
cmcamdy
7a2e33098f
[XPU] Refactor pre process ( #6993 )
...
* [XPU] support speculate_pre_process
* merge develop
* fix codestype
* fix mtp, support cu_seqlens_q_output
* fix mtp, support cu_seqlens_q_output
* fix test
---------
Co-authored-by: lizan1999 <lizan03@baidu.com >
2026-04-01 20:29:55 +08:00
cmcamdy
bf8e9bf81d
[XPU] Fix speculate schedule ( #7049 )
...
* [BugFix] xpu fix speculate schedule cache kernel
* fix code style
2026-03-27 18:28:17 +08:00
zhupengyang
5780345646
[XPU] fix speculate_verify ( #6985 )
2026-03-24 18:55:09 +08:00
lizan1999
148eee84c6
[XPU] use quant2d_per_token for weight quant int8 && fix some XPU Kernel check ( #6869 )
2026-03-17 19:44:48 +08:00
mayang002
72ff7bf4cd
[XPU] Fix wrapper files ( #6830 )
...
- Add WRAPPER_CHECK_PTR for pointer validity checks
- Add WRAPPER_ASSERT_GT/GE/LE for parameter range validation
- Simplify wrapper function calls to direct return pattern
2026-03-16 14:39:40 +08:00
Yonghua Li
7c8c0a3c02
[BugFix] replace ftok with custom_ftok in get_output/save_output ops ( #6822 )
...
* [BugFix] replace ftok with custom_ftok in get_output/save_output ops
* [Test] add unit test for custom_ftok
* [Chore] create custom_ftok.h
* [Chore] reorganize header file
* [Fix] fix cache messager msg_queue_id+rank_id conflict
2026-03-16 14:22:18 +08:00
cmcamdy
7591e0d6bc
fix eb5 mtp(mix) ( #6800 )
2026-03-13 17:36:57 +08:00
mayang002
1f9f889e37
[XPU] refactor: XPU plugin namespace migration ( #6799 )
...
* [XPU] refactor: XPU plugin namespace migration
- Migrate wrapper layer namespace from baidu::xpu::api::plugin to fastdeploy::plugin
- Migrate kernel layer namespace from xpu3::plugin to fd_xpu3
- Add api:: prefix for types (Context, SUCCESS, XPUIndexType, ctx_guard)
- Remove XPU2 support, keep only XPU3
- Update ops/ directory to use new namespace
Total: 137 files changed
* [XPU] fix: add return value check and correct error messages
- Add PADDLE_ENFORCE_XDNN_SUCCESS check for speculate_get_logits and update_attn_mask_offsets
- Fix empty error message in draft_model_postprocess
- Correct function name in speculate_schedule_cache error message
- Update error messages from 'xpu::plugin::' to 'fastdeploy::plugin::'
2026-03-13 10:21:51 +08:00
cmcamdy
3543088d3e
[XPU] rm stop nums ( #6651 )
...
* rm stop nums
* fix conflict
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-03-12 14:05:58 +08:00
Jiajun Ji
88c4fbf8e1
[XPU] Add speculate_limit_thinking_content_length Op. ( #6627 )
...
* [XPU] Add speculate_limit_thinking_content_length OP for xpu.
* add unittest.
* format codes.
* format codes.
* format codes.
* Fix unused kernel launch return value.
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
2026-03-11 17:30:17 +08:00
mayang002
ecc5032176
[XPU] Add return value checks for all XPU kernel launches ( #6666 )
...
* [XPU] Add return value checks for all XPU kernel launches
- Add -fxpu-launch-return compiler flag in CMakeLists.txt to enable
kernel launch return values
- Add KERNEL_ASSERT_SUCCESS(ctx, ret_xre) checks after every XPU
kernel launch across 45 wrapper files (55 launch sites total)
- Covers both main wrapper/ and mtp_wrapper/ directories
- Properly handles multiple kernel launches in the same function
scope by reusing the ret_xre variable
* [XPU] code style fix
2026-03-10 10:45:18 +08:00
gongweibao
ddb06ff83f
init ( #6642 )
...
Co-authored-by: gongweibao <gognweibao@baidu.com >
2026-03-04 21:55:31 +08:00
lizan1999
c637692427
[XPU] support MTP Step > 1 ( #6609 )
...
Co-authored-by: lizan1999 <lizan03@baidu.com >
2026-03-04 10:07:37 +08:00
Jiajun Ji
4ff3f4212f
[XPU] Add update_attn_mask_offsets op for xpu. ( #6556 )
...
* add update_attn_mask_offsets op for xpu.
* format code style.
* format codes with pre-commit.
2026-03-03 18:00:05 +08:00
ming1753
97eee75677
[Feature] GPU Memory Optimization and Retirement of V0 Scheduler ( #6407 )
...
* Optim GPU Mem Usage
---------
Co-authored-by: huzesen <huzesen@baidu.com >
2026-02-28 15:07:43 +08:00
cmcamdy
13447279aa
[XPU] Fix PD + MTP ( #6495 )
...
* fix pd + mtp
* fix code style
* fix PD + MTP, D get P's first token
* add anno for gpu(speculate_update)
* update draft insertv1
* fix wapper & kernel
* fix wapper
* fix code stype
2026-02-27 19:07:35 +08:00
lizan1999
72edd394d9
[XPU] support noaux_tc ( #6326 )
2026-02-05 12:04:16 +08:00
RuohengMa
976203cf60
[XPU ]fix text_image_gather_scatter in cudagraph mode( #6049 )
2026-01-23 19:48:43 +08:00
lizan1999
b3a48529ab
[XPU] add more type for recover batch sequence ( #6142 )
2026-01-23 15:16:05 +08:00
yinwei
51a8a2ed57
[XPU] Support CudaGraph(add block attn cuda_graph support) ( #6116 )
...
* add block attn cuda_graph support
2026-01-20 19:33:11 +08:00
zhupengyang
45ebb2efb4
[XPU] support plugin model ( #6092 )
2026-01-20 13:00:09 +08:00
cmcamdy
59d8ae0a25
[XPU] Speculate Decoding + PD, benchmark fix ( #6036 )
...
* fix mtp pd
* fix kernel
* fix code style
* fix kernel
* fix test / clear debug code
* fix test / clear debug code
* fix codestyle
* fix codestyle
* fix codestyle
2026-01-15 19:19:03 +08:00
Daci
e10b51b8c6
[Feature] get_output_kv_signal blocking read mode & send_first_token ( #5836 )
...
* get_output_kv_signal blocking read mode
* send first token before recycle
* xpu get_output_kv_signal blocking read mode
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-15 14:11:03 +08:00
chenjian
74d0f1c01f
[Optim] Robust sync status when preempted happens ( #5796 )
...
* [Bug fix] Sync status for caching output cache
* fix
* fix
* fix bug
* fix
* fix
* support xpu
* fix
* fix
* fix
* fix
* fix
* fix ci
* fix ci
* fix xpu
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-01-14 12:07:33 +08:00
zhupengyang
9db48ecb34
[XPU] fix dp4 ( #5946 )
2026-01-09 20:36:53 +08:00
ddchenhao66
733014bf32
[XPU] Support EP4TP1 in pd disaggregation ( #5860 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-06 15:25:36 +08:00
cmcamdy
690d4bcdb0
[XPU] Speculative Decoding with PD ( #5856 )
...
* [XPU] Speculative Decoding with PD
* fix post process
* share kv cache sender
* support speculate decoding step system cache
* support speculate decoding step system cache
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com >
2026-01-05 17:31:03 +08:00
ddchenhao66
9e45ef7ca9
[XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL ( #5831 )
2025-12-31 09:49:12 +08:00
CSWYF3634076
9286403570
[Models] Add Qwen3-VL Model Support ( #5763 )
...
* support v1 loader
* remove useless code
* remove useless
* [Model] support Qwen3VL images success
* [Model] support Qwen3VL rope_3d
* [Model] support Qwen3VL remove log
* [Model] support Qwen3VL RL
* [Model] support Qwen3VL tp
* [Model] support Qwen3VL video
* [Model] support Qwen3VL fix ernievl
* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds
* [Model] support Qwen3VL fix multi card
* [Model] support Qwen3VL file close
* [Model] support Qwen3VL fix ce
* [Model] support Qwen3VL fix unittest
* [Model] support Qwen3VL add unittest
---------
Co-authored-by: Ayakouji <yuhongh@qq.com >
2025-12-29 17:39:33 +08:00
freeliuzc
9018ccf74e
[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes ( #5738 )
...
* fix attn_mask_offset in mtp with multi-step and pd-split-mode
* fix xpu operater register
* update pmtp multi-step mtp strategy in d-split -mode
* add note
* fix xpu register
2025-12-25 01:54:59 -08:00
RuohengMa
e154c03416
[XPU] refine moe_expert_ffn ut ( #5743 )
2025-12-25 10:35:24 +08:00
RuohengMa
2c3c983b96
[XPU] modify speculate_verify ( #5522 )
2025-12-23 14:50:30 +08:00
lizan1999
ec6811f648
support token num = 0 ( #5635 )
...
Co-authored-by: lizan1999 <lizan03@baidu.com >
Co-authored-by: cmcamdy <1027740945@qq.com >
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-12-19 10:20:38 +08:00
lizan1999
e1a9b282eb
fix bug for EP+MTP ( #5605 )
...
Co-authored-by: lizan1999 <lizan03@baidu.com >
2025-12-18 14:34:54 +08:00
zhupengyang
8735cb5045
[XPU] refactor moe ffn ( #5501 )
...
- remove BKCL_DISPATCH_ALL_GATHER
- support sparse mode
- support moe quant_method
2025-12-18 14:14:05 +08:00
ddchenhao66
9f70f4310e
[PD Disaggregation][XPU] update_inputs_v1 operator supports PD ( #5550 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-15 15:39:38 +08:00
RuohengMa
12c76f8137
[XPU] add speculate_get_logits ( #5497 )
...
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_get_logits
* delete context
* add ptr check
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-12 15:38:30 +08:00
Lucas
888c4b992d
[XPU] refactor of block_attn param 'pos_emb_type' ( #5511 )
2025-12-12 14:30:09 +08:00
Juncai
d67388a479
[PD Disaggregation] Distinguish the pipelines for sending kv signal in different prefill ( #5514 )
...
* Distinguish the pipelines for sending kv signal in different prefill
* up
2025-12-12 14:05:36 +08:00
cmcamdy
3c1f7b85a4
[XPU] support get hidden state for mix ( #5513 )
...
* fix git hidden states
* fix code style
* fix code style
2025-12-12 10:31:20 +08:00
RuohengMa
8178e3fc6a
[XPU] add speculate_step_system_cache ( #5397 )
...
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_step_system_cache
---------
Co-authored-by: cmcamdy <1027740945@qq.com >
2025-12-09 14:40:11 +08:00