FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-22 16:07:51 +08:00

Author	SHA1	Message	Date
yzwu	e4a4573080	[Iluvatar] Fix cannot import name mtp_save_first_token (#7495 ) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-04-21 14:09:08 +08:00
RuohengMa	9d3551cfbb	[XPU] add support for rope3d (#7518 ) * [XPU] add support for rope3d * support decoder --------- Co-authored-by: yinwei <yinwei_hust@163.com>	2026-04-21 13:39:00 +08:00
周周周	609f649dd7	[OP] Add flashmla baseline implementation and precision test (#7477 )	2026-04-21 13:37:52 +08:00
YuBaoku	3c8c82d5d4	[CI] Remove flashinfer cache cleanup to reduce unit test runtime (#7476 )	2026-04-21 11:38:30 +08:00
YuBaoku	5e866e3e21	[CI] Add --workers=1 to keep test behavior consistent with default change	2026-04-20 22:31:42 +08:00
Zhang Yulong	30db3e9d8f	[benchmark] update tools (#7512 )	2026-04-20 19:40:17 +08:00
YuBaoku	c9783a84a6	[CI] Temporarily pin paddlepaddle-gpu to 3.5.0.dev20260417 (#7486 )	2026-04-20 19:35:34 +08:00
K11OntheBoat	b79b094dcc	Change default workers and max-concurrency when launch api-server (#7457 ) Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com>	2026-04-20 15:55:06 +08:00
ZhijunLStudio	a0c39cc9af	[Typo] Fix parameter name typo in slice_fn: paramter -> parameter (#7462 ) Fix the typo in the internal parameter name of slice_fn(): weight_or_paramter -> weight_or_parameter. No functional changes. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-20 10:06:02 +08:00
RuohengMa	cf5bc5e510	[XPU] fix bug and teporary fix for rope 3d (#7465 )	2026-04-20 09:51:27 +08:00
YuBaoku	b2aca6c550	[CI] Improve logging check accuracy and unify error log cleanup (#7473 )	2026-04-18 19:41:21 +08:00
freeliuzc	22a4f6019d	[Speculative Decoding][BugFix] Fix apply repeat times penalty kernel and change spec default verify strategy (#7467 ) * fix repeat_time kernel and change default spec verify strategy * fix unit_test	2026-04-18 00:38:01 +08:00
GoldPancake	df3b4e12f4	[Speculative Decoding] Add MTP logprob support for PD disaggregation (#7442 ) * support mtp logprob in pd * fix * fix * fix * fix xpu bugs	2026-04-17 21:37:38 +08:00
yzwu	3b9d6c60d3	[Iiluvatar] fix ci error and update readme (#7453 )	2026-04-17 20:42:56 +08:00
jackyYang6	a729e0f729	[Bugfix][RL] fix control request timeout in async update weights pipeline (#7430 )	2026-04-17 16:45:33 +08:00
freeliuzc	43685a98a7	[BugFix] Fix real token exceeding max_batched_tokens limit (#7438 ) * fix max_num_batched_tokens error compute * add temperatory solution * fix bug	2026-04-17 16:18:07 +08:00
jc	6847891241	Mooncake storage register local buffer by chunk (#7416 )	2026-04-17 10:39:34 +08:00
YuBaoku	91b8bf20f0	[CI] Add pytest failure log collection and persistence (#7405 )	2026-04-16 22:56:17 +08:00
AIbin	6ce4854714	[Feature] Support MOE Cutlass backend for latent MOE (#7428 ) * support moe cutlass backend latent moe	2026-04-16 22:11:49 +08:00
ShaneGZhu	2d8338f9e4	[Optimization][DeepSeekV3.2]Reducing slot_mapping compute frequency from twice per layer to a single pre-processing step. (#7367 )	2026-04-16 19:54:12 +08:00
RichardWooSJTU	d2d633b05c	allow parallel dp starting (#7426 )	2026-04-16 18:43:09 +08:00
RichardWooSJTU	420a8c1af5	fix deep gemm import (#7425 )	2026-04-16 17:56:56 +08:00
ddchenhao66	e9527208d9	[BugFix][XPU] Fix kv_cache management bug (#7420 )	2026-04-16 15:45:45 +08:00
zhouchong	6e16438a57	[Feature] implement log channel separation and request log level system (#7190 ) * feat: implement log channel separation and request log level system * fix: log system improvements based on review * add request_id to error logs, use RequestLogLevel enum, and unify logger implementation from utils to logger module	2026-04-16 15:13:05 +08:00
Jiajun Ji	29495b2cf1	[XPU] Unify Spec and non-spec branch.(#6947 ) (#7180 ) * [XPU] cherry-pick PR-6947 * [XPU] use unified_update_model_status. * refactor xpu_model_runner. * refactor sampler. * fix codestyle. * Fix XPU speculative decoding: rename output tensors to cu_seqlens_q_output/batch_id_per_token_output, correct WRAPPER_CHECK_PTR types, and fix dynamic gather shape in verify_draft_tokens path. * fix codestyle. * replace output_padding_offset with is_speculative flag in gather_next_token. * rename hiddden_states. * unify cu_seqlens_q_output and batch_id_per_token_output init. --------- Co-authored-by: cmcamdy <1027740945@qq.com>	2026-04-16 14:58:38 +08:00
YuBaoku	17002edc47	[CI] Add approval check for logging-related modifications (#7429 )	2026-04-16 14:50:22 +08:00
RuohengMa	de0c5e68fb	[XPU] Split the block_attn operator into smaller operators (#6798 ) * spliced block_attn * adapt to latest vllm * fix unit tests * delete mtp+cudagraph 4 cards test * fix vl model * fix mtp * fix slot mapping	2026-04-16 14:28:40 +08:00
Bingoo	6b891da02b	[Optimization] enable trtllm_all_reduce fusion kernel in glm model (#6660 ) * enable trtllm_all_reduce fusion kernel in glm model * fix conflict * format update * fix a bug * modify test * modify test * support empty tensor and modify test * fix test_linear config issues * modify test name * add edge test case * modify format * fix conflict * modify default max token num in trtllm_allreduce_fusion * add max token num branch for trtllm_allreduce_fusion * fix format * fix rmsnorm config issue * modify 2025 to 2026 * using compat grard * Lazily import flashinfer.comm and fix test config issue * fix test issues * add flashinfer cache dir clean machine * fix some issues	2026-04-16 14:10:19 +08:00
jc	e53f5184ac	PD deployment support without router (#7412 )	2026-04-15 20:13:07 +08:00
GoldPancake	a498720a75	[RL] Add clear_graph_opt_backend for glm4_mtp (#7378 ) * add clear_grpah func * fix spell	2026-04-15 19:44:15 +08:00
RichardWooSJTU	dec0b060fc	[Optimization] Auto set num_max_dispatch_tokens_per_rank (#7237 ) * auto set num_max_dispatch_tokens_per_rank * fix ci * fix ci * fix ci	2026-04-15 19:13:38 +08:00
luukunn	3f84d8d893	[DataProcessor] Refactor multimodal processor: extract encoding strategies and unify MM processing pipeline (#7298 ) * merge mm processor	2026-04-15 19:01:06 +08:00
Bingoo	a218d29488	modify flash_mask version (#7413 )	2026-04-15 18:16:58 +08:00
luukunn	14d556692b	[BugFix] fix tool call parser (#7369 ) * fix tool call parser * add unit test * fix unit test * add unit test	2026-04-15 16:21:46 +08:00
AIbin	8eebbcaf15	[BugFix][Scheduler]Fix FD_DISABLE_CHUNKED_PREFILL max_num_batched_tokens limit (#7407 ) * fix FD_DISABLE_CHUNKED_PREFILL max_num_batched_tokens=max_model_len * fix FD_DISABLE_CHUNKED_PREFILL max_num_batched_tokens=max_model_len	2026-04-15 15:55:11 +08:00
周周周	5e54770b2e	[Feature] 添加 MoE 层 latent mode 支持 (#7382 )	2026-04-15 13:57:07 +08:00
lonelygsh	f7a2418ce2	[Speculate Decoding] Fix reasoning_phase_token_constraint call args in SpeculativeSampler (#7402 )	2026-04-15 12:45:23 +08:00
AIbin	8995a38fa4	fix dsa indexer norm to layernorm (#7398 )	2026-04-15 11:42:45 +08:00
AIbin	bb30f88f1a	[Models] support MLA gate attention (#7404 ) * support mla gate attn * support mla gate attn	2026-04-15 11:42:34 +08:00
chen	616b29ce08	check init_flash_attn_version log (#7399 )	2026-04-15 11:05:10 +08:00
cmcamdy	13b9fe7299	[XPU] add verify draft tokens (#6947 ) * [XPU] add verify draft tokens * fix test * fix code style * use sync cpy * fix code style * fix kernel check * fix ramdom seed * fix test * fix check * fix eos set * fix verify * fix verify	2026-04-15 10:18:33 +08:00
lonelygsh	e0a1653b26	[Speculate Decoding] Fix bug of reasoning_phase_token_constraint kernel (#7349 ) Co-authored-by: guanshihui] <guanshihui@baidu.com>	2026-04-14 20:57:11 +08:00
sunxin	7b0baced17	fix rl moe gate type (#7393 )	2026-04-14 20:04:04 +08:00
Echo-Nie	8819a039c9	[Others] Fix typo (#7280 ) * typo * typo * typo * typo	2026-04-14 17:28:22 +08:00
luukunn	9d9d79c457	[DataProcessor] add strict (#7307 ) * add strict * fix	2026-04-14 17:25:38 +08:00
kevin	ff47701f31	[BugFix][PD Disaggregation][KVCache] Fix low cache hit rate in PD split scenario (#7364 ) ## Motivation 在 PD 分离场景下，decode 节点在接收 prefill 节点转发的请求后，没有及时更新 cache block 的命中信息，导致 prefix cache 命中率低，影响推理性能。 ## Modifications 1. 在 `_free_blocks_when_stop` 方法中，额外排除 prefill 节点（`splitwise_role == "prefill"`）的 cache block 更新，避免 prefill 节点重复更新 cache 导致状态混乱。 2. 在 decode 节点分配请求（`_alloc_requests_with_cache`）成功后，主动调用 `update_cache_blocks` 使用 `need_prefill_tokens` 更新 cache block 信息，确保 decode 节点能正确感知已命中的 prefix cache。	2026-04-14 16:15:43 +08:00
Bingoo	9c23e6154c	[Others] replace tool_helpers to fast_dataindex (#7353 ) * replace tool_helpers to fast_dataindex * modify others requirement	2026-04-14 15:13:54 +08:00
xiaoxiaohehe001	abba29b348	[BugFix] fix mm rope (#7274 )	2026-04-14 11:36:08 +08:00
Yuanle Liu	8f21c9caa6	[BugFix] fix gitignore claude (#7381 )	2026-04-13 20:32:45 -07:00
zhupengyang	27b00cf385	[XPU] glm-4.5-air (#7071 )	2026-04-14 11:31:49 +08:00

1 2 3 4 5 ...

5089 Commits