FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
google-labs-jules[bot]	69c7dd0a19	⚡ Bolt: Optimize single element list appends Replaced instances of `.extend([item])` with `.append(item)` in multiple files. Using `.extend([item])` incurs memory overhead by allocating a new single-element list and is computationally slower than calling `.append(item)` directly. Files updated: - fastdeploy/input/encodings/ernie_encoding.py - fastdeploy/input/ernie4_5_vl_processor/process.py - fastdeploy/output/token_processor.py - fastdeploy/worker/gpu_model_runner.py - fastdeploy/worker/metax_model_runner.py	2026-04-15 16:45:13 +00:00
GoldPancake	a498720a75	[RL] Add clear_graph_opt_backend for glm4_mtp (#7378 ) * add clear_grpah func * fix spell	2026-04-15 19:44:15 +08:00
luukunn	3f84d8d893	[DataProcessor] Refactor multimodal processor: extract encoding strategies and unify MM processing pipeline (#7298 ) * merge mm processor	2026-04-15 19:01:06 +08:00
Echo-Nie	8819a039c9	[Others] Fix typo (#7280 ) * typo * typo * typo * typo	2026-04-14 17:28:22 +08:00
xiaoxiaohehe001	abba29b348	[BugFix] fix mm rope (#7274 )	2026-04-14 11:36:08 +08:00
freeliuzc	31e2a8bbad	[Speculative Decoding] Support mtp super ultra overlap in pd-split mode with insert_task overlap (#7323 ) * support mtp overlap in pd-split mode with insert_task overlap	2026-04-13 19:41:17 +08:00
sunxin	00005c92e0	[BugFix] Fix mtp empty run issue in overlap schedule and EP model (#7300 )	2026-04-10 03:29:45 -07:00
chenjian	427efadaee	[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 (#7159 ) * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 * fix	2026-04-08 19:30:54 +08:00
RichardWooSJTU	771d42c90b	[TBO] Apply tbo to gpu_model_runner (#7165 ) * apply tbo in gpu_model_runner * fix	2026-04-08 16:55:17 +08:00
K11OntheBoat	bb48bcbaa2	Split enable_mm (#7183 ) Co-authored-by: liuruian <liuruian@MacBook-Pro.local>	2026-04-08 11:25:41 +08:00
GoldPancake	9d4fd19c3f	[Speculative Decoding] Auto-scale CUDA graph capture sizes for speculative decoding (#7215 )	2026-04-07 20:22:28 +08:00
Nana	367d37b523	fix typo (#7147 )	2026-04-07 16:30:32 +08:00
huicongyao	095a11d932	fix MTP bugs in TP and overlap (#7172 ) * fix MTP bugs in TP and overlap * fix	2026-04-03 14:19:11 +08:00
sunxin	c29e86fc9d	[Feature] Support mtp overlap schedule (#7001 )	2026-04-01 14:24:26 +08:00
jackyYang6	05f2d95729	[RL] Adapt async rollout checkpoint update flow (#7042 ) * update checkpoint-transfer flow and control update_weights params * test: add update_weights route validation	2026-03-30 19:19:34 +08:00
GoldPancake	6693bcd0e4	[BugFix] fix clear_parameters in draft cudagraph (#7035 )	2026-03-27 15:28:50 +08:00
freeliuzc	4fd877ed43	[Speculative Decoding] Support mtp expert-parallel and support different modality deploy (#7018 ) * support mtp ep and support different modality * fix default arg	2026-03-26 13:52:16 +08:00
Yonghua Li	a7f52c300d	[Feature] support v1 update/clear api for RL (#6761 ) * [Feature] support v1 update/clear api for RL * [fix] fix execute_model and add sleep/wakeup api * [fix] fix mtp and key_prefix * [chore] move _update_key_prefix to resume method * [fix] make the interface safe to call multiple times * [fix] fix some tiny bugs * [chore] make small changes against pr review * [docs] add docs for weight update * [test] add some tests and update docs * [style] fix code style check * [test] fix ci * [fix] fix stale control responses when control method timed out * [chore] remove unused code * [chore] fix code style * [chore] optimize tags and key_prefix * [test] fix ci * [chore] fix code style * [test] fix ci * [fix] fix ep control * [fix] fix ep control for engine cache queue	2026-03-25 19:18:46 +08:00
freeliuzc	e87ce4b8cd	[Speculative Decoding] refactor MTP and optimize spec-decoding postprocess (#6973 ) * support new mtp * refactor(speculate_decoding and mtp): optimize mtp sturcture logic. Update spec-branch status-process * fix cuda-graph for spec-decoding * fix xpu mtp and fix some note * fix unittest and optmize note * fix model status update in eos-branch	2026-03-24 10:19:01 +08:00
bukejiyu	c62f6b4ea5	[Others] Fix PD reorder for MTP (#6792 ) * fix pd reorder in mtp * add ut * update * fix mtp	2026-03-23 21:10:22 +08:00
sunxin	7a78001be2	fix execute_model_normal in empty run (#6968 )	2026-03-23 14:07:46 +08:00
周周周	1c38da2118	Make seq_lens_this_time/decoder/encoder equal shape (#6942 )	2026-03-20 15:31:52 +08:00
qwes5s5	3b7507a4c2	test_abort (#6743 )	2026-03-17 14:06:40 +08:00
huicongyao	eab429d05e	fix performance drop while no spec (#6866 )	2026-03-17 13:06:36 +08:00
gongweibao	a6351dea0b	[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 ) * init * init * fix format * add * add files * add ut * fix some * add ut * add more * add * fix pre-commit * fix pre-commit * fix cover * skip long seq * add * add * fix * remove not need * fix set attr * fix comments * fix comments * fix failed tests --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-16 21:32:43 +08:00
ming1753	bb925c605f	[Other] Adjust GPUModelRunner to enhance compatibility (#6851 )	2026-03-16 14:49:19 +08:00
huicongyao	2e63d88f7a	[Optimization][Speculative Decoding]Fuse padding sampling params (#6765 ) * optimize speculate pre process unit test * Add CUDA kernel for building sampling params in speculative decoding * init infer seed in device * format code * add unittest & fix * fix * format-code * format-code * fix rebase * . * fix unitest	2026-03-12 05:05:15 -07:00
RAM	cdaf6dd400	[RL][Cherry-Pick] Support Fully Async and PrefixCache (#6599 ) * cherry-pick Support Fully Async and PrefixCache step 1 * copy routing_indices_cache.py from 2.4 * cherry-pick [RL] R3 Fix the bug for determining the end of a request (#6388) * cherry-pick [RL] Clear Requests status of R3 (#6569) * delete code * fix rename bug * fix status shape bug * fix ci	2026-03-12 01:13:30 -07:00
Yonghua Li	7811eeccaa	[fix] resolve get_save_output_v1 socket name conflicts between multiple instances (#6758 )	2026-03-11 15:02:32 +08:00
freeliuzc	cf7934a4b2	[Speculative Decoding] Unify Spec and non-spec branch (#6685 ) * optimize spec-inference architecture * delete debug log * optimize spec_method usage && fix unit_test * add claude unit-test skill * fix some ugly bug * enhance robustness and bounds check * unify method & spec_method to method to avoid bug * activate CI * fix unit test * Unify logprobs computation for naive and speculative decoding, fix CUDA kernel * fix logprob bug && optimize verify kernel * fix exist_decode() judge	2026-03-10 23:58:44 -07:00
sunxin	812657beee	fix pd overlap (#6753 )	2026-03-10 20:29:54 +08:00
AIbin	c3aceb6bdc	[Models][OP][Optimization] Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM (#6689 ) * Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM	2026-03-10 15:05:14 +08:00
sunxin	28f7727a3d	[Feature] Set overlap schedule as default (#6668 ) * overlap default	2026-03-09 22:34:54 +08:00
jc	b0fd242add	[BugFix] Fix error in dynamic c8 cache (#6544 ) * [BugFix] Fix error in dynamic c8 cache * fix device id	2026-03-06 10:11:23 +08:00
sunxin	0dc7034ce0	[Model Runner] Deprecate not_need_stop (#6356 ) * Deprecate not_need_stop	2026-03-05 10:55:42 +08:00
sunxin	aee97e3aae	fix exist_prefill_flag when preempted task (#6629 )	2026-03-04 11:11:40 +08:00
huicongyao	0f718baaf2	[Speculative Decoding]Reformat input preprocess for spec decode (#6501 ) * add speculate_pre_process kernel * reduce one slice * make d2h async && fix mtp bug for new pre_process * fix * add unitest * fix: code stype formatting * fix * fix: thread race in speculate_preprocess && rename d2h event	2026-03-03 10:22:07 +08:00
ming1753	97eee75677	[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407 ) * Optim GPU Mem Usage --------- Co-authored-by: huzesen <huzesen@baidu.com>	2026-02-28 15:07:43 +08:00
cmcamdy	13447279aa	[XPU] Fix PD + MTP (#6495 ) * fix pd + mtp * fix code style * fix PD + MTP, D get P's first token * add anno for gpu(speculate_update) * update draft insertv1 * fix wapper & kernel * fix wapper * fix code stype	2026-02-27 19:07:35 +08:00
gongweibao	edd31e8849	[Feature] Add Deterministic Inference Support (#6476 ) * add * [tests] Add Paddle attention determinism tests and refactor resource manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * add * add * add * add * add more * add more * fixsome * fixsome * fix bugs * fix bugs * only in gpu * add docs * fix comments * fix some * fix some * fix comments * add more * fix potential problem * remove not need * remove not need * remove no need * fix bug * fix bugs * fix comments * fix comments * Update tests/ce/deterministic/test_determinism_verification.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/inter_communicator/test_ipc_signal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/engine/test_sampling_params_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism_standalone.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix comments * fix import error * fix a bug * fix bugs * fix bugs * fix coverage * refine codes * refine code * fix comments * fix comments * fix comments * rm not need * fix allreduce large tensor bug * mv log files * mv log files * add files --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-26 19:31:51 -08:00
GoldPancake	2178f2829b	[Speculative Decoding] Support suffix decoding (#6403 ) * support suffix decoding	2026-02-26 11:42:05 +08:00
Yuanle Liu	6d3fede240	[OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6493 ) * Initial plan * Migrate PRs #6311, #6129, #6305 to develop and merge unit tests Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix * update * fix * fix ci * fix ci * Initial plan * test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add disable-thinking case to test_chat_with_response_max_tokens Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add both reasoning_max_tokens and response_max_tokens case Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix ci * fix ci * fix ci --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>	2026-02-25 21:36:50 +08:00
Yonghua Li	e2332a1112	[BugFix] fix num_cpu_blocks computation (#6438 ) * [BugFix] fix num_cpu_blocks computation * [fix] fix syntax and log * [fix] pre-commit * [fix] use getattr * [fix] ci test	2026-02-13 11:05:14 +08:00
yzwu	60e75ea8e8	[Iluvatar][CI] Fix cannot import get_stop (#6165 )	2026-02-10 16:57:23 +08:00
kevin	3ce842b55b	[BugFix] add reset shared inputs when update weight dummy run (#6331 ) * fix dummy run input bug * update code * update code * update code * update code	2026-02-10 10:29:03 +08:00
kevin	d60daca4a8	[Feature] consider multimodal model when dummy run (#6045 ) * add mm do profile * updata code * update code * update code * update code * update test case * update code * update code * fix xpu bug * update code * add mm do profile * update test case * update code	2026-02-09 17:49:55 +08:00
sunxin	783d56e28a	[Optimization] Support logprob async copy (#6362 ) * support logprob async copy * fix prompt logprob * fix xpu	2026-02-09 17:32:12 +08:00
周周周	2b4748de4f	[MTP] refactor MTP pre_process (#6358 )	2026-02-09 10:47:15 +08:00
GoldPancake	183b8d325a	[RL] Support GLM MTP RL Model (#6267 )	2026-02-04 20:14:35 +08:00
sunxin	9b0a82cfa9	[Model Runner] Support overlap schedule (#6259 )	2026-02-04 10:49:44 +08:00

1 2 3 4 5

238 Commits