FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
CSWYF3634076	97a4b3631e	[Processor]add qwen3vl prompt_token_ids support (#6764 ) * [Processor]add qwen3vl prompt_token_ids support * [Processor]add qwen3vl prompt_token_ids support unittest * [Processor]add qwen3vl prompt_token_ids support precommit	2026-03-11 15:08:56 +08:00
bukejiyu	cffa8c246c	[Others]update paddleformer 1.0.0 (#6496 ) * update paddleformer 1.0.0 * update	2026-03-11 15:06:29 +08:00
Yonghua Li	7811eeccaa	[fix] resolve get_save_output_v1 socket name conflicts between multiple instances (#6758 )	2026-03-11 15:02:32 +08:00
freeliuzc	cf7934a4b2	[Speculative Decoding] Unify Spec and non-spec branch (#6685 ) * optimize spec-inference architecture * delete debug log * optimize spec_method usage && fix unit_test * add claude unit-test skill * fix some ugly bug * enhance robustness and bounds check * unify method & spec_method to method to avoid bug * activate CI * fix unit test * Unify logprobs computation for naive and speculative decoding, fix CUDA kernel * fix logprob bug && optimize verify kernel * fix exist_decode() judge	2026-03-10 23:58:44 -07:00
ddchenhao66	a502dda1fe	[BugFix] fix multi-step mtp bug (#6754 )	2026-03-11 10:16:04 +08:00
Jiang-Jia-Jun	b05a6c4206	[BugFix][KVCache] Add inter-process lock to fix NaN error under DP+EP (#6724 ) * [BugFix] Support to fix NaN bug in EP * Optimze notion for all the funs * Fix potential lock contention failure issues * Update fastdeploy/inter_communicator/ipc_signal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update envs.py * Update default value for USE_KVCACHE_LOCK Change default value of USE_KVCACHE_LOCK from 1 to 0. * Update worker_process.py * Fix suffix wrong * Update test_prefix_cache_manager.py --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-10 21:55:32 +08:00
Yonghua Li	6520ae807c	[BugFix] fix grpc failure when tracing init before workers forked (#6732 ) * [fix] fix grpc failure when tracing init before workers forked * [fix] change default exporter to http * [fix] fix test_trace	2026-03-10 21:24:10 +08:00
yzwu	67388ce2f3	[Iluvatar][CI] Replace ci in ernie-300B-4layer with ernie-21b. (#6747 )	2026-03-10 17:25:52 +08:00
YuBaoku	596519831c	[CI] Temporarily disable test_determinism_offline.py	2026-03-10 16:54:30 +08:00
YuBaoku	73de8b9795	[CI] Update test_determinism_long.py to reduce execution time	2026-03-10 11:34:36 +08:00
周周周	3897a0b4fc	nvfp4 clean code (#6671 )	2026-03-09 18:00:34 +08:00
0Ayachi0	0c69cdf56e	[CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 (#6208 ) * [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 * update test_fused_moe_triton_backend.py * fix: apply code style formatting * Merge branch 'develop' into develop * Merge branch 'develop' into develop * Merge branch 'develop' into develop * Merge branch 'develop' into develop --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-03-09 14:24:08 +08:00
gongweibao	30f9f33f34	[Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kernel-level Fixes and Batch-invariant BMM (#6610 ) * add fa deter * add ut * add long sentence * fix basic * fix bugs * fix adn * fix first * fix single * fix single * fix single test * refine * add more test * refine comments * add comments of bmm * fix ci * remove probe * add * remove not need * refine tests * fix comments and refine code * refine code * refine test * refine test * mv 4cards tests * fix tests * add * fix comments * fix cover * fix cover --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-09 10:27:53 +08:00
ddchenhao66	3c0ff20328	[BugFix] fix incorrect function parameters of start_data_parallel_service (#6674 )	2026-03-09 10:15:50 +08:00
YuBaoku	cbfdf42628	[CI] Add test_dynamic_c8_cache.py and latest FastDeploy.tar.gz upload (#6708 )	2026-03-08 16:01:12 +08:00
gongweibao	1e49855b0f	[BugFix][DataProcessor] Add validate_model_path to fail fast on bad model path or unreachable network (#6713 ) * fix * add more endpoint * fix some --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-08 12:36:32 +08:00
luukunn	aac1484b0d	[Feature]add arguments string in tool (#6704 ) * add arguments string	2026-03-06 20:45:09 +08:00
luukunn	caf73e8131	[Feature]add reasoning effort (#6656 ) * add reasoning_effort * fix log * fix reasoning_effort * add reasoning_effort level * fix valid_parameters * fix valid_parameters * fix * fix unit test * add unit test * add unit test	2026-03-06 14:16:02 +08:00
yzwu	81acdb62bd	[Iluvatar][CI] Do not specify FD_LOG_DIR (#6665 )	2026-03-06 11:54:44 +08:00
YuBaoku	16a393e90e	[CI] Fix non-deterministic test and skip failed_tests.log in log print (#6672 )	2026-03-05 18:47:18 +08:00
sunxin	0dc7034ce0	[Model Runner] Deprecate not_need_stop (#6356 ) * Deprecate not_need_stop	2026-03-05 10:55:42 +08:00
ddchenhao66	fa4815b93a	[BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server (#6598 ) * [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server * [BugFix] modify request_queue and result_queue of dp scheduler	2026-03-05 10:04:12 +08:00
YuBaoku	56ceeda80c	[CI] Adjust model-specific diff threshold and include iluvatar XPU paths in coverage (#6663 )	2026-03-05 10:02:54 +08:00
ming1753	02d32eea3b	Revert "[Bug Fix] Fix MM mtp incorrect rope emb (#6581 )" (#6631 ) This reverts commit `c5eb6b65e7`.	2026-03-04 11:23:28 +08:00
kesmeey	3d3221e24e	[CI] 【Hackathon 10th Spring No.31】功能模块 fastdeploy/model_executor/layers/sample/sampler.py单测补充 (#6200 ) * Format code with black * Format sampler tests * update * update	2026-03-04 10:57:37 +08:00
YuBaoku	c3d6d706d5	[CI] Add nightly workflow for golang_router tests and improve log handling (#6608 ) * [CI] Add nightly workflow for Golang router tests * [CI] Improve pytest script stability and log handling	2026-03-03 19:36:57 +08:00
ming1753	c5eb6b65e7	[Bug Fix] Fix MM mtp incorrect rope emb (#6581 ) * [Bug Fix] Fix MM mtp incorrect rope emb	2026-03-03 19:28:59 +08:00
qwes5s5	375b5b7b21	[Feature]Log Format Normalization and Trace Log Optimization (#6370 ) * log refactor * log refactor 2 * log refactor 3	2026-03-03 11:31:45 +08:00
huicongyao	0f718baaf2	[Speculative Decoding]Reformat input preprocess for spec decode (#6501 ) * add speculate_pre_process kernel * reduce one slice * make d2h async && fix mtp bug for new pre_process * fix * add unitest * fix: code stype formatting * fix * fix: thread race in speculate_preprocess && rename d2h event	2026-03-03 10:22:07 +08:00
kesmeey	aae87e6ae2	[CI] 【Hackathon 10th Spring No.27】功能模块 fastdeploy/cache_manager/prefix_cache_manager.py单测补充 (#6297 ) * test: update prefix cache manager tests * test: refine prefix cache manager coverage helpers * style: apply black formatting to test_prefix_cache_manager.py Co-authored-by: Cursor <cursoragent@cursor.com> * tests: update test_prefix_cache_manager Co-authored-by: Cursor <cursoragent@cursor.com> * update --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-03-02 20:04:12 +08:00
kesmeey	758770bc43	[CI] 【Hackathon 10th Spring No.28】功能模块 fastdeploy/entrypoints/engine_client.py 单测补充 (#6158 ) * fix codestyle and update unit test coverage workflow * fix test_engine_client.py: add main_process_metrics mock to prevent KeyError * fix test_engine_client.py: comprehensive test improvements * feat: enhance test_engine_client.py with comprehensive test improvements * fix: resolve test failures in test_engine_client.py * test: enhance EngineClient test coverage with comprehensive test suite * test: add comprehensive EngineClient test suite (codestyle checked)	2026-03-02 14:29:23 +08:00
yzwu	6674131b0b	[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553 )	2026-03-02 14:07:17 +08:00
YuBaoku	481d0e385f	[CI] Skip long-sequence case due to potential non-determinism (#6587 )	2026-03-02 11:34:15 +08:00
周周周	d957ccd46d	seq_lens related tensor shape -> [max_num_seqs] (#6535 )	2026-03-02 11:18:30 +08:00
AIbin	59b578c337	[Feature]Supports SWA based on appendattn (#6547 )	2026-03-01 19:02:08 +08:00
Yonghua Li	7cf5e64c7a	[BugFix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend (#6516 ) * [fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend * [fix] fix test_cache_transfer_manager * [fix] fix test_cache_transfer_manager again --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-03-01 13:43:31 +08:00
YuBaoku	bb51829bd5	[CI] Fix tests and docs to resolve failure (#6572 )	2026-03-01 12:33:01 +08:00
0Ayachi0	977e2cc202	[CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 (#6209 ) * [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-02-28 19:29:02 +08:00
zccjjj	a2072fe20c	[XPU] support warmup with ep & remove apply_tp_fused_op (#6289 )	2026-02-28 15:40:36 +08:00
ming1753	97eee75677	[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407 ) * Optim GPU Mem Usage --------- Co-authored-by: huzesen <huzesen@baidu.com>	2026-02-28 15:07:43 +08:00
YuBaoku	54f7d9f621	[CI] Sync mm_batch_invariant with paddle.mm update (#6557 )	2026-02-28 14:56:42 +08:00
YuBaoku	8e67fb422c	[CI] disable test_batch_invariance_op_mm.py in unit_test (#6548 )	2026-02-28 10:16:14 +08:00
xunyoyo	12f754ef38	[CI] 【Hackathon 10th Spring No.42】功能模块 fastdeploy/entrypoints/openai/serving_chat.py单测补充 (#6112 ) * test: expand OpenAI serving chat coverage * Import RequestOutput in test_serving_chat.py * Reorder import statements in test_serving_chat.py * test: fix tool_calls finish_reason case * test: refine serving_chat coverage * test: format serving_chat tests --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-27 16:32:46 +08:00
ZeLong Li	81ea3674b0	[CI] 【Hackathon 10th Spring No.26】功能模块 fastdeploy/utils.py 单测补充 (#6146 ) test (#6146) Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-27 16:28:40 +08:00
xunyoyo	ff61a7f5a1	[CI] 【Hackathon 10th Spring No.40】功能模块 fastdeploy/model_executor/layers/linear.py单测补充 (#6107 ) * Add linear layer tests for model executor * Refine linear layer tests for uncovered branches * Refactor and enhance tests for linear layers Refactor test_linear.py by removing unused imports and redundant code, and updating model configuration parameters. Add new tests for linear layers and their loading mechanisms. * test: patch row-parallel alltoall in unit test * test: avoid alltoall reshape failure in row-parallel * test: expand linear coverage targets * Refine linear tests per review feedback * Fix linear tests for pre-sharded config and qkv fixture --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-27 16:25:23 +08:00
sunxin	53aaac69da	[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457 ) * gate bf16 * add gate-fp32 * fix * update baseline * update * update * fix	2026-02-26 21:08:46 -08:00
gongweibao	edd31e8849	[Feature] Add Deterministic Inference Support (#6476 ) * add * [tests] Add Paddle attention determinism tests and refactor resource manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * add * add * add * add * add more * add more * fixsome * fixsome * fix bugs * fix bugs * only in gpu * add docs * fix comments * fix some * fix some * fix comments * add more * fix potential problem * remove not need * remove not need * remove no need * fix bug * fix bugs * fix comments * fix comments * Update tests/ce/deterministic/test_determinism_verification.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/inter_communicator/test_ipc_signal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/engine/test_sampling_params_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism_standalone.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix comments * fix import error * fix a bug * fix bugs * fix bugs * fix coverage * refine codes * refine code * fix comments * fix comments * fix comments * rm not need * fix allreduce large tensor bug * mv log files * mv log files * add files --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-26 19:31:51 -08:00
zccjjj	c34cb2a8c2	[XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape (#6337 )	2026-02-27 09:55:41 +08:00
kesmeey	bf14ea18aa	tests: fix cache_transfer_manager threading and init mocks (#6502 ) tests: fix cache_transfer_manager threading and init mocks	2026-02-26 17:32:51 +08:00
yinwei	256651e9de	Add PD Cudagraph CI Case	2026-02-26 17:01:20 +08:00

1 2 3 4 5 ...

879 Commits