FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-22 16:07:51 +08:00

Author	SHA1	Message	Date
RuohengMa	36d47aa23e	[XPU] get_infer_param use inplace copy, remove block_tables abundant d2h copy (#7431 ) * inplace_copy: encoder_batch_idx/decoder_batch_idx bs == 9 ok * inplace_copy: encoder_seq_lod/decoder_seq_lod bs == 9 ok * inplace_copy: all bs == 9 ok * inplace_copy: all cpu bs == 9 ok * inplace_copy: len_info_cpu bs == 9 ok * finished and rm unused code * prefix_block_tables reuse * refine * improve performance * remove block_table copy to cpu * fix unit test * fix * resolve conflict * refine code * fix * fix * fix * fix * fix * try fix unit tests * fix * tmp save * fix unit test * get_infer_param try less return values * add yinwei fix --------- Co-authored-by: yinwei <yinwei_hust@163.com>	2026-04-22 11:01:32 +08:00
RuohengMa	9d3551cfbb	[XPU] add support for rope3d (#7518 ) * [XPU] add support for rope3d * support decoder --------- Co-authored-by: yinwei <yinwei_hust@163.com>	2026-04-21 13:39:00 +08:00
RuohengMa	cf5bc5e510	[XPU] fix bug and teporary fix for rope 3d (#7465 )	2026-04-20 09:51:27 +08:00
Jiajun Ji	29495b2cf1	[XPU] Unify Spec and non-spec branch.(#6947 ) (#7180 ) * [XPU] cherry-pick PR-6947 * [XPU] use unified_update_model_status. * refactor xpu_model_runner. * refactor sampler. * fix codestyle. * Fix XPU speculative decoding: rename output tensors to cu_seqlens_q_output/batch_id_per_token_output, correct WRAPPER_CHECK_PTR types, and fix dynamic gather shape in verify_draft_tokens path. * fix codestyle. * replace output_padding_offset with is_speculative flag in gather_next_token. * rename hiddden_states. * unify cu_seqlens_q_output and batch_id_per_token_output init. --------- Co-authored-by: cmcamdy <1027740945@qq.com>	2026-04-16 14:58:38 +08:00
RuohengMa	de0c5e68fb	[XPU] Split the block_attn operator into smaller operators (#6798 ) * spliced block_attn * adapt to latest vllm * fix unit tests * delete mtp+cudagraph 4 cards test * fix vl model * fix mtp * fix slot mapping	2026-04-16 14:28:40 +08:00
cmcamdy	13b9fe7299	[XPU] add verify draft tokens (#6947 ) * [XPU] add verify draft tokens * fix test * fix code style * use sync cpy * fix code style * fix kernel check * fix ramdom seed * fix test * fix check * fix eos set * fix verify * fix verify	2026-04-15 10:18:33 +08:00
Echo-Nie	8819a039c9	[Others] Fix typo (#7280 ) * typo * typo * typo * typo	2026-04-14 17:28:22 +08:00
zhupengyang	27b00cf385	[XPU] glm-4.5-air (#7071 )	2026-04-14 11:31:49 +08:00
Jiajun Ji	cb03958b52	[XPU] Refactor get_padding_offset to single kernel. (#7029 ) * [XPU] Refactor get_padding_offset to single kernel. * add unittest. * fix codestyle. * remove cum_offsets_now. * remove max_len.	2026-04-13 11:04:50 +08:00
Jiaxin Sui	6e5de2fd6d	[XPU][CI]Update xtdk version in download_dependencies.sh (#7320 )	2026-04-11 00:26:48 +08:00
Jiaxin Sui	80d5d9fd32	[XPU][CI] lock xvllm version for fix bug (#7264 ) * Remove duplicate NICs from environment variables * Update version for xvllm in download_dependencies.sh	2026-04-09 12:44:27 +08:00
cmcamdy	7a2e33098f	[XPU] Refactor pre process (#6993 ) * [XPU] support speculate_pre_process * merge develop * fix codestype * fix mtp, support cu_seqlens_q_output * fix mtp, support cu_seqlens_q_output * fix test --------- Co-authored-by: lizan1999 <lizan03@baidu.com>	2026-04-01 20:29:55 +08:00
cmcamdy	bf8e9bf81d	[XPU] Fix speculate schedule (#7049 ) * [BugFix] xpu fix speculate schedule cache kernel * fix code style	2026-03-27 18:28:17 +08:00
zhupengyang	5780345646	[XPU] fix speculate_verify (#6985 )	2026-03-24 18:55:09 +08:00
lizan1999	148eee84c6	[XPU] use quant2d_per_token for weight quant int8 && fix some XPU Kernel check (#6869 )	2026-03-17 19:44:48 +08:00
mayang002	72ff7bf4cd	[XPU] Fix wrapper files (#6830 ) - Add WRAPPER_CHECK_PTR for pointer validity checks - Add WRAPPER_ASSERT_GT/GE/LE for parameter range validation - Simplify wrapper function calls to direct return pattern	2026-03-16 14:39:40 +08:00
Yonghua Li	7c8c0a3c02	[BugFix] replace ftok with custom_ftok in get_output/save_output ops (#6822 ) * [BugFix] replace ftok with custom_ftok in get_output/save_output ops * [Test] add unit test for custom_ftok * [Chore] create custom_ftok.h * [Chore] reorganize header file * [Fix] fix cache messager msg_queue_id+rank_id conflict	2026-03-16 14:22:18 +08:00
cmcamdy	7591e0d6bc	fix eb5 mtp(mix) (#6800 )	2026-03-13 17:36:57 +08:00
mayang002	1f9f889e37	[XPU] refactor: XPU plugin namespace migration (#6799 ) * [XPU] refactor: XPU plugin namespace migration - Migrate wrapper layer namespace from baidu::xpu::api::plugin to fastdeploy::plugin - Migrate kernel layer namespace from xpu3::plugin to fd_xpu3 - Add api:: prefix for types (Context, SUCCESS, XPUIndexType, ctx_guard) - Remove XPU2 support, keep only XPU3 - Update ops/ directory to use new namespace Total: 137 files changed * [XPU] fix: add return value check and correct error messages - Add PADDLE_ENFORCE_XDNN_SUCCESS check for speculate_get_logits and update_attn_mask_offsets - Fix empty error message in draft_model_postprocess - Correct function name in speculate_schedule_cache error message - Update error messages from 'xpu::plugin::' to 'fastdeploy::plugin::'	2026-03-13 10:21:51 +08:00
cmcamdy	3543088d3e	[XPU] rm stop nums (#6651 ) * rm stop nums * fix conflict --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-03-12 14:05:58 +08:00
Jiajun Ji	88c4fbf8e1	[XPU] Add speculate_limit_thinking_content_length Op. (#6627 ) * [XPU] Add speculate_limit_thinking_content_length OP for xpu. * add unittest. * format codes. * format codes. * format codes. * Fix unused kernel launch return value. --------- Co-authored-by: cmcamdy <1027740945@qq.com>	2026-03-11 17:30:17 +08:00
mayang002	ecc5032176	[XPU] Add return value checks for all XPU kernel launches (#6666 ) * [XPU] Add return value checks for all XPU kernel launches - Add -fxpu-launch-return compiler flag in CMakeLists.txt to enable kernel launch return values - Add KERNEL_ASSERT_SUCCESS(ctx, ret_xre) checks after every XPU kernel launch across 45 wrapper files (55 launch sites total) - Covers both main wrapper/ and mtp_wrapper/ directories - Properly handles multiple kernel launches in the same function scope by reusing the ret_xre variable * [XPU] code style fix	2026-03-10 10:45:18 +08:00
gongweibao	ddb06ff83f	init (#6642 ) Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-04 21:55:31 +08:00
lizan1999	c637692427	[XPU] support MTP Step > 1 (#6609 ) Co-authored-by: lizan1999 <lizan03@baidu.com>	2026-03-04 10:07:37 +08:00
Jiajun Ji	4ff3f4212f	[XPU] Add update_attn_mask_offsets op for xpu. (#6556 ) * add update_attn_mask_offsets op for xpu. * format code style. * format codes with pre-commit.	2026-03-03 18:00:05 +08:00
ming1753	97eee75677	[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407 ) * Optim GPU Mem Usage --------- Co-authored-by: huzesen <huzesen@baidu.com>	2026-02-28 15:07:43 +08:00
cmcamdy	13447279aa	[XPU] Fix PD + MTP (#6495 ) * fix pd + mtp * fix code style * fix PD + MTP, D get P's first token * add anno for gpu(speculate_update) * update draft insertv1 * fix wapper & kernel * fix wapper * fix code stype	2026-02-27 19:07:35 +08:00
lizan1999	72edd394d9	[XPU] support noaux_tc (#6326 )	2026-02-05 12:04:16 +08:00
RuohengMa	976203cf60	[XPU ]fix text_image_gather_scatter in cudagraph mode(#6049 )	2026-01-23 19:48:43 +08:00
lizan1999	b3a48529ab	[XPU] add more type for recover batch sequence (#6142 )	2026-01-23 15:16:05 +08:00
yinwei	51a8a2ed57	[XPU] Support CudaGraph(add block attn cuda_graph support) (#6116 ) * add block attn cuda_graph support	2026-01-20 19:33:11 +08:00
zhupengyang	45ebb2efb4	[XPU] support plugin model (#6092 )	2026-01-20 13:00:09 +08:00
cmcamdy	59d8ae0a25	[XPU] Speculate Decoding + PD, benchmark fix (#6036 ) * fix mtp pd * fix kernel * fix code style * fix kernel * fix test / clear debug code * fix test / clear debug code * fix codestyle * fix codestyle * fix codestyle	2026-01-15 19:19:03 +08:00
Daci	e10b51b8c6	[Feature] get_output_kv_signal blocking read mode & send_first_token (#5836 ) * get_output_kv_signal blocking read mode * send first token before recycle * xpu get_output_kv_signal blocking read mode --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-15 14:11:03 +08:00
chenjian	74d0f1c01f	[Optim] Robust sync status when preempted happens (#5796 ) * [Bug fix] Sync status for caching output cache * fix * fix * fix bug * fix * fix * support xpu * fix * fix * fix * fix * fix * fix ci * fix ci * fix xpu --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-14 12:07:33 +08:00
zhupengyang	9db48ecb34	[XPU] fix dp4 (#5946 )	2026-01-09 20:36:53 +08:00
ddchenhao66	733014bf32	[XPU] Support EP4TP1 in pd disaggregation (#5860 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2026-01-06 15:25:36 +08:00
cmcamdy	690d4bcdb0	[XPU] Speculative Decoding with PD (#5856 ) * [XPU] Speculative Decoding with PD * fix post process * share kv cache sender * support speculate decoding step system cache * support speculate decoding step system cache --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com>	2026-01-05 17:31:03 +08:00
ddchenhao66	9e45ef7ca9	[XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL (#5831 )	2025-12-31 09:49:12 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
freeliuzc	9018ccf74e	[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 ) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register	2025-12-25 01:54:59 -08:00
RuohengMa	e154c03416	[XPU] refine moe_expert_ffn ut (#5743 )	2025-12-25 10:35:24 +08:00
RuohengMa	2c3c983b96	[XPU] modify speculate_verify (#5522 )	2025-12-23 14:50:30 +08:00
lizan1999	ec6811f648	support token num = 0 (#5635 ) Co-authored-by: lizan1999 <lizan03@baidu.com> Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-19 10:20:38 +08:00
lizan1999	e1a9b282eb	fix bug for EP+MTP (#5605 ) Co-authored-by: lizan1999 <lizan03@baidu.com>	2025-12-18 14:34:54 +08:00
zhupengyang	8735cb5045	[XPU] refactor moe ffn (#5501 ) - remove BKCL_DISPATCH_ALL_GATHER - support sparse mode - support moe quant_method	2025-12-18 14:14:05 +08:00
ddchenhao66	9f70f4310e	[PD Disaggregation][XPU] update_inputs_v1 operator supports PD (#5550 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2025-12-15 15:39:38 +08:00
RuohengMa	12c76f8137	[XPU] add speculate_get_logits (#5497 ) * [XPU] add speculate_step_system_cache * [XPU] add speculate_step_system_cache * [XPU] add speculate_get_logits * delete context * add ptr check --------- Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-12 15:38:30 +08:00
Lucas	888c4b992d	[XPU] refactor of block_attn param 'pos_emb_type' (#5511 )	2025-12-12 14:30:09 +08:00
Juncai	d67388a479	[PD Disaggregation] Distinguish the pipelines for sending kv signal in different prefill (#5514 ) * Distinguish the pipelines for sending kv signal in different prefill * up	2025-12-12 14:05:36 +08:00

1 2

95 Commits