FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 17:11:21 +08:00

Author	SHA1	Message	Date
GoldPancake	a1fc4e249e	[Bugfix] Fix mtp logprob hang problem when include stop_seq (#5927 ) * fix mtp logprob hang when include stop_seq	2026-01-08 14:21:24 +08:00
lizhenyun01	2be8656c29	[BugFix] fix mtp split kv attetion (#5920 ) * [BugFix] fix mtp split kv attetion * clean code * clean code	2026-01-07 04:07:31 -08:00
kevin	a76e8ae40c	[Feature] support rdma pd dy-c8 (#5788 ) * add rdma pd dy-c8 * update code	2026-01-07 14:55:25 +08:00
周周周	f15df1ec89	Revert cuda check (#5915 ) * commit * commit	2026-01-07 14:40:18 +08:00
yangjianfengo1	59523b27de	opt w4afp8 (#5853 )	2026-01-07 12:22:35 +08:00
MingkunZhang	7ad5737560	[Metax] adapt to gemm interface on different versions of maca (#5905 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-07 10:02:24 +08:00
周周周	83ae59431e	[BugFix] fix BatchMLAWithPagedKVCacheKernel O_tmp (#5895 )	2026-01-06 15:39:06 +08:00
ddchenhao66	733014bf32	[XPU] Support EP4TP1 in pd disaggregation (#5860 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2026-01-06 15:25:36 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
Yuanle Liu	5e729bc2ba	[OPs] ep_moe_expert_dispatch.cu dispatch num_experts_per_rank 5 (#5890 )	2026-01-06 10:39:35 +08:00
Neil Zhu	272a371635	[Metax] optimize flash attention backend (#5876 )	2026-01-06 09:52:09 +08:00
周周周	ab553b3b8b	revert cuda_check (#5883 )	2026-01-05 20:51:31 +08:00
lizexu123	1d3ae7c024	[BugFix] fix w4afp8 tp=8 (#5868 ) * fix w4afp8 tp=8 * fix	2026-01-05 18:59:02 +08:00
cmcamdy	690d4bcdb0	[XPU] Speculative Decoding with PD (#5856 ) * [XPU] Speculative Decoding with PD * fix post process * share kv cache sender * support speculate decoding step system cache * support speculate decoding step system cache --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com>	2026-01-05 17:31:03 +08:00
chen	ac39c0f887	support fa3 qwen-vl rope (#5869 )	2026-01-05 15:29:34 +08:00
sunxin	adb91dcacc	[BugFix] Fix wint4 ep issue caused by empty run (#5870 )	2026-01-05 14:24:37 +08:00
周周周	e3957a5ebc	[Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620 )	2026-01-04 11:21:15 +08:00
MingkunZhang	f732d7d2ad	[Metax] adapt prefix caching & cpu swap (#5844 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2025-12-31 17:02:48 +08:00
ddchenhao66	9e45ef7ca9	[XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL (#5831 )	2025-12-31 09:49:12 +08:00
Sunny-bot1	598d292a69	w4afp8 fix quant (#5830 )	2025-12-30 21:16:13 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
Yonghua Li	a8d3e3ba12	[BugFix] fix shm opened but not closed in set_data_ipc (#5826 )	2025-12-29 23:35:07 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
周周周	a3f0696e35	[BugFix] fix compile error in sm89 (#5809 )	2025-12-29 16:55:52 +08:00
Longzhi Wang	11329ee35e	[Model] support mode config for expert_dispatch (#5748 )	2025-12-29 13:37:20 +08:00
Ryan	09229d8953	change `count_tokens_per_expert_func` declaration: `Tensor` -> `vector<Tensor>` (#5794 )	2025-12-26 19:02:28 +08:00
Ryan	724045c426	add some op infershape&dtype (#5762 )	2025-12-26 16:17:39 +08:00
周周周	03363cab4c	make flash_mask attention pybind (#5783 )	2025-12-26 14:31:35 +08:00
kevin	5538dda3c8	[Feature] pd support dy-c8 ipc (#5750 ) * pd support dy-c8 ipc * update code * support v0 * update code	2025-12-25 21:22:34 +08:00
freeliuzc	9018ccf74e	[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 ) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register	2025-12-25 01:54:59 -08:00
Juncai	412867fd99	[Feature] Support KV Cache Storage (#5571 ) * Support Mooncake Store * up * up * add op * fix conflict * fix error * up for comments * avoid thread lock * up * fix unittest * fix unittest * remove debug info * consider tp_size > 1 * add default rdma_nics * add utils * up * fix error --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-25 16:30:35 +08:00
RuohengMa	e154c03416	[XPU] refine moe_expert_ffn ut (#5743 )	2025-12-25 10:35:24 +08:00
chen	c7ab32d154	check (#5736 )	2025-12-24 16:49:20 +08:00
周周周	922a73ddd6	[Others] clean code (#5691 )	2025-12-24 11:28:47 +08:00
RuohengMa	2c3c983b96	[XPU] modify speculate_verify (#5522 )	2025-12-23 14:50:30 +08:00
lizexu123	6d323769dd	fix w4afp8 (#5634 )	2025-12-22 13:39:41 +08:00
chen	a32cb54d0b	[BugFix] Fix custom_all_reduce overflow (#5662 ) * check * check * code style	2025-12-19 18:24:21 +08:00
lizan1999	ec6811f648	support token num = 0 (#5635 ) Co-authored-by: lizan1999 <lizan03@baidu.com> Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-19 10:20:38 +08:00
yzwu	ac013803f3	[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555 )	2025-12-18 02:14:25 -08:00
lizan1999	e1a9b282eb	fix bug for EP+MTP (#5605 ) Co-authored-by: lizan1999 <lizan03@baidu.com>	2025-12-18 14:34:54 +08:00
zhupengyang	8735cb5045	[XPU] refactor moe ffn (#5501 ) - remove BKCL_DISPATCH_ALL_GATHER - support sparse mode - support moe quant_method	2025-12-18 14:14:05 +08:00
Yuanle Liu	cdc0004894	Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563 )" (#5611 ) This reverts commit `73e1d6aa90`.	2025-12-17 13:59:06 +08:00
Yuanle Liu	867803ae10	[BugFix] fix speculate_limit_thinking_content_length (#5590 ) * fix speculate_limit_thinking_content_length * update	2025-12-16 04:31:45 -08:00
chen	27ef3610b5	support glm fa3 (#5586 )	2025-12-16 19:33:27 +08:00
fxyfxy777	73e1d6aa90	[Feature] add ue8m0 for per_token_quant_fp8 (#5563 ) * ue8m0 * add default arg --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-16 18:40:12 +08:00
Echo-Nie	50100f98d7	[Feature] Support fusedmoe on Blackwell (#5325 ) * update sm100 * fix * fix style	2025-12-16 11:58:50 +08:00
freeliuzc	532f9ba227	[BugFix][Speculative Decoding](Spend many dyas to solve)Fix write qknorm cache bug in speculative decoding (#5491 ) * [liuzichang spend 10 dyas]fix write qknorm cache bug * fix 'fix cachekv bug''	2025-12-15 18:27:11 +08:00
ddchenhao66	9f70f4310e	[PD Disaggregation][XPU] update_inputs_v1 operator supports PD (#5550 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2025-12-15 15:39:38 +08:00
chen	a389bb7c5c	[Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (#5486 )	2025-12-12 17:10:17 +08:00
RuohengMa	12c76f8137	[XPU] add speculate_get_logits (#5497 ) * [XPU] add speculate_step_system_cache * [XPU] add speculate_step_system_cache * [XPU] add speculate_get_logits * delete context * add ptr check --------- Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-12 15:38:30 +08:00

1 2 3 4 5 ...

314 Commits