FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
周周周	e3957a5ebc	[Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620 )	2026-01-04 11:21:15 +08:00
MingkunZhang	f732d7d2ad	[Metax] adapt prefix caching & cpu swap (#5844 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2025-12-31 17:02:48 +08:00
ddchenhao66	9e45ef7ca9	[XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL (#5831 )	2025-12-31 09:49:12 +08:00
Sunny-bot1	598d292a69	w4afp8 fix quant (#5830 )	2025-12-30 21:16:13 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
Yonghua Li	a8d3e3ba12	[BugFix] fix shm opened but not closed in set_data_ipc (#5826 )	2025-12-29 23:35:07 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
周周周	a3f0696e35	[BugFix] fix compile error in sm89 (#5809 )	2025-12-29 16:55:52 +08:00
Longzhi Wang	11329ee35e	[Model] support mode config for expert_dispatch (#5748 )	2025-12-29 13:37:20 +08:00
Ryan	09229d8953	change `count_tokens_per_expert_func` declaration: `Tensor` -> `vector<Tensor>` (#5794 )	2025-12-26 19:02:28 +08:00
Ryan	724045c426	add some op infershape&dtype (#5762 )	2025-12-26 16:17:39 +08:00
周周周	03363cab4c	make flash_mask attention pybind (#5783 )	2025-12-26 14:31:35 +08:00
kevin	5538dda3c8	[Feature] pd support dy-c8 ipc (#5750 ) * pd support dy-c8 ipc * update code * support v0 * update code	2025-12-25 21:22:34 +08:00
freeliuzc	9018ccf74e	[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 ) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register	2025-12-25 01:54:59 -08:00
Juncai	412867fd99	[Feature] Support KV Cache Storage (#5571 ) * Support Mooncake Store * up * up * add op * fix conflict * fix error * up for comments * avoid thread lock * up * fix unittest * fix unittest * remove debug info * consider tp_size > 1 * add default rdma_nics * add utils * up * fix error --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-25 16:30:35 +08:00
RuohengMa	e154c03416	[XPU] refine moe_expert_ffn ut (#5743 )	2025-12-25 10:35:24 +08:00
chen	c7ab32d154	check (#5736 )	2025-12-24 16:49:20 +08:00
周周周	922a73ddd6	[Others] clean code (#5691 )	2025-12-24 11:28:47 +08:00
RuohengMa	2c3c983b96	[XPU] modify speculate_verify (#5522 )	2025-12-23 14:50:30 +08:00
lizexu123	6d323769dd	fix w4afp8 (#5634 )	2025-12-22 13:39:41 +08:00
chen	a32cb54d0b	[BugFix] Fix custom_all_reduce overflow (#5662 ) * check * check * code style	2025-12-19 18:24:21 +08:00
lizan1999	ec6811f648	support token num = 0 (#5635 ) Co-authored-by: lizan1999 <lizan03@baidu.com> Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-19 10:20:38 +08:00
yzwu	ac013803f3	[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555 )	2025-12-18 02:14:25 -08:00
lizan1999	e1a9b282eb	fix bug for EP+MTP (#5605 ) Co-authored-by: lizan1999 <lizan03@baidu.com>	2025-12-18 14:34:54 +08:00
zhupengyang	8735cb5045	[XPU] refactor moe ffn (#5501 ) - remove BKCL_DISPATCH_ALL_GATHER - support sparse mode - support moe quant_method	2025-12-18 14:14:05 +08:00
Yuanle Liu	cdc0004894	Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563 )" (#5611 ) This reverts commit `73e1d6aa90`.	2025-12-17 13:59:06 +08:00
Yuanle Liu	867803ae10	[BugFix] fix speculate_limit_thinking_content_length (#5590 ) * fix speculate_limit_thinking_content_length * update	2025-12-16 04:31:45 -08:00
chen	27ef3610b5	support glm fa3 (#5586 )	2025-12-16 19:33:27 +08:00
fxyfxy777	73e1d6aa90	[Feature] add ue8m0 for per_token_quant_fp8 (#5563 ) * ue8m0 * add default arg --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-16 18:40:12 +08:00
Echo-Nie	50100f98d7	[Feature] Support fusedmoe on Blackwell (#5325 ) * update sm100 * fix * fix style	2025-12-16 11:58:50 +08:00
freeliuzc	532f9ba227	[BugFix][Speculative Decoding](Spend many dyas to solve)Fix write qknorm cache bug in speculative decoding (#5491 ) * [liuzichang spend 10 dyas]fix write qknorm cache bug * fix 'fix cachekv bug''	2025-12-15 18:27:11 +08:00
ddchenhao66	9f70f4310e	[PD Disaggregation][XPU] update_inputs_v1 operator supports PD (#5550 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2025-12-15 15:39:38 +08:00
chen	a389bb7c5c	[Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (#5486 )	2025-12-12 17:10:17 +08:00
RuohengMa	12c76f8137	[XPU] add speculate_get_logits (#5497 ) * [XPU] add speculate_step_system_cache * [XPU] add speculate_step_system_cache * [XPU] add speculate_get_logits * delete context * add ptr check --------- Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-12 15:38:30 +08:00
Lucas	888c4b992d	[XPU] refactor of block_attn param 'pos_emb_type' (#5511 )	2025-12-12 14:30:09 +08:00
Juncai	d67388a479	[PD Disaggregation] Distinguish the pipelines for sending kv signal in different prefill (#5514 ) * Distinguish the pipelines for sending kv signal in different prefill * up	2025-12-12 14:05:36 +08:00
cmcamdy	3c1f7b85a4	[XPU] support get hidden state for mix (#5513 ) * fix git hidden states * fix code style * fix code style	2025-12-12 10:31:20 +08:00
FocusLuo	c3aaa7e441	[BugFix] Fixed build script issue on Intel HPU platforms (#5455 ) * [INTEL HPU] Fixed build script issue for non-gpu platforms Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] PR CI HPU will not use fixed version of fastdeploy_intel_hpu Signed-off-by: Luo, Focus <focus.luo@intel.com> --------- Signed-off-by: Luo, Focus <focus.luo@intel.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-11 16:36:37 +08:00
Neil Zhu	4403a21d4b	[Metax] refactor cutlass moe and optimize flash attention (#5361 ) * [Metax] refactor moe and flash attention backend --------- Co-authored-by: zhangchenyi_dl <16219492+zhangchenyidl@user.noreply.gitee.com>	2025-12-10 17:15:17 +08:00
Copilot	e38709b499	[BugFix] Fix limit_thinking early return logic in CUDA kernels (#5471 ) * Initial plan * [BugFix] Fix limit_thinking bug - change AND to OR in condition checks Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * Update Chinese comments to reflect OR logic instead of AND Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>	2025-12-10 11:03:19 +08:00
lzy	99f607eef5	[Others] Maintain the mtp branch temporarily. (#5446 )	2025-12-09 19:17:53 +08:00
lizexu123	95eab9f9ee	[Feature] support stop_token_ids (#5399 ) * support stop_token_ids * fix * delete chinese * support both * delete print	2025-12-09 17:49:12 +08:00
xiaozude	df67379bc3	[Metax] modify wrapSize to WARP_SIZE (#5442 )	2025-12-09 01:44:02 -08:00
周周周	31410415db	FA3 support qwen3 (#5441 )	2025-12-09 16:16:16 +08:00
RuohengMa	8178e3fc6a	[XPU] add speculate_step_system_cache (#5397 ) * [XPU] add speculate_step_system_cache * [XPU] add speculate_step_system_cache --------- Co-authored-by: cmcamdy <1027740945@qq.com>	2025-12-09 14:40:11 +08:00
K11OntheBoat	8d99bac532	Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-12-09 14:17:30 +08:00
周周周	2aea8a3a60	[Others] Remove useless code (#5404 )	2025-12-08 13:59:46 +08:00
GoldPancake	8545b705ed	fix top_p_candidates (#5400 ) Co-authored-by: freeliuzc <lzc842650834@gmail.com>	2025-12-05 20:01:05 +08:00
Lucas	8f2b85362d	[XPU] support moe_expert_ffn TGEMM selection (#5375 )	2025-12-05 17:49:40 +08:00
Lucas	3aed8d257d	[XPU] redirect xvllm/xtdk/xhpc downloading log (#5388 )	2025-12-05 17:34:17 +08:00

... 2 3 4 5 6 ...

448 Commits