FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
YuBaoku	596519831c	[CI] Temporarily disable test_determinism_offline.py	2026-03-10 16:54:30 +08:00
YuBaoku	73de8b9795	[CI] Update test_determinism_long.py to reduce execution time	2026-03-10 11:34:36 +08:00
周周周	3897a0b4fc	nvfp4 clean code (#6671 )	2026-03-09 18:00:34 +08:00
0Ayachi0	0c69cdf56e	[CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 (#6208 ) * [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 * update test_fused_moe_triton_backend.py * fix: apply code style formatting * Merge branch 'develop' into develop * Merge branch 'develop' into develop * Merge branch 'develop' into develop * Merge branch 'develop' into develop --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-03-09 14:24:08 +08:00
gongweibao	30f9f33f34	[Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kernel-level Fixes and Batch-invariant BMM (#6610 ) * add fa deter * add ut * add long sentence * fix basic * fix bugs * fix adn * fix first * fix single * fix single * fix single test * refine * add more test * refine comments * add comments of bmm * fix ci * remove probe * add * remove not need * refine tests * fix comments and refine code * refine code * refine test * refine test * mv 4cards tests * fix tests * add * fix comments * fix cover * fix cover --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-09 10:27:53 +08:00
ddchenhao66	3c0ff20328	[BugFix] fix incorrect function parameters of start_data_parallel_service (#6674 )	2026-03-09 10:15:50 +08:00
YuBaoku	cbfdf42628	[CI] Add test_dynamic_c8_cache.py and latest FastDeploy.tar.gz upload (#6708 )	2026-03-08 16:01:12 +08:00
gongweibao	1e49855b0f	[BugFix][DataProcessor] Add validate_model_path to fail fast on bad model path or unreachable network (#6713 ) * fix * add more endpoint * fix some --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-08 12:36:32 +08:00
luukunn	aac1484b0d	[Feature]add arguments string in tool (#6704 ) * add arguments string	2026-03-06 20:45:09 +08:00
luukunn	caf73e8131	[Feature]add reasoning effort (#6656 ) * add reasoning_effort * fix log * fix reasoning_effort * add reasoning_effort level * fix valid_parameters * fix valid_parameters * fix * fix unit test * add unit test * add unit test	2026-03-06 14:16:02 +08:00
yzwu	81acdb62bd	[Iluvatar][CI] Do not specify FD_LOG_DIR (#6665 )	2026-03-06 11:54:44 +08:00
YuBaoku	16a393e90e	[CI] Fix non-deterministic test and skip failed_tests.log in log print (#6672 )	2026-03-05 18:47:18 +08:00
sunxin	0dc7034ce0	[Model Runner] Deprecate not_need_stop (#6356 ) * Deprecate not_need_stop	2026-03-05 10:55:42 +08:00
ddchenhao66	fa4815b93a	[BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server (#6598 ) * [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server * [BugFix] modify request_queue and result_queue of dp scheduler	2026-03-05 10:04:12 +08:00
YuBaoku	56ceeda80c	[CI] Adjust model-specific diff threshold and include iluvatar XPU paths in coverage (#6663 )	2026-03-05 10:02:54 +08:00
ming1753	02d32eea3b	Revert "[Bug Fix] Fix MM mtp incorrect rope emb (#6581 )" (#6631 ) This reverts commit `c5eb6b65e7`.	2026-03-04 11:23:28 +08:00
kesmeey	3d3221e24e	[CI] 【Hackathon 10th Spring No.31】功能模块 fastdeploy/model_executor/layers/sample/sampler.py单测补充 (#6200 ) * Format code with black * Format sampler tests * update * update	2026-03-04 10:57:37 +08:00
YuBaoku	c3d6d706d5	[CI] Add nightly workflow for golang_router tests and improve log handling (#6608 ) * [CI] Add nightly workflow for Golang router tests * [CI] Improve pytest script stability and log handling	2026-03-03 19:36:57 +08:00
ming1753	c5eb6b65e7	[Bug Fix] Fix MM mtp incorrect rope emb (#6581 ) * [Bug Fix] Fix MM mtp incorrect rope emb	2026-03-03 19:28:59 +08:00
qwes5s5	375b5b7b21	[Feature]Log Format Normalization and Trace Log Optimization (#6370 ) * log refactor * log refactor 2 * log refactor 3	2026-03-03 11:31:45 +08:00
huicongyao	0f718baaf2	[Speculative Decoding]Reformat input preprocess for spec decode (#6501 ) * add speculate_pre_process kernel * reduce one slice * make d2h async && fix mtp bug for new pre_process * fix * add unitest * fix: code stype formatting * fix * fix: thread race in speculate_preprocess && rename d2h event	2026-03-03 10:22:07 +08:00
kesmeey	aae87e6ae2	[CI] 【Hackathon 10th Spring No.27】功能模块 fastdeploy/cache_manager/prefix_cache_manager.py单测补充 (#6297 ) * test: update prefix cache manager tests * test: refine prefix cache manager coverage helpers * style: apply black formatting to test_prefix_cache_manager.py Co-authored-by: Cursor <cursoragent@cursor.com> * tests: update test_prefix_cache_manager Co-authored-by: Cursor <cursoragent@cursor.com> * update --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-03-02 20:04:12 +08:00
kesmeey	758770bc43	[CI] 【Hackathon 10th Spring No.28】功能模块 fastdeploy/entrypoints/engine_client.py 单测补充 (#6158 ) * fix codestyle and update unit test coverage workflow * fix test_engine_client.py: add main_process_metrics mock to prevent KeyError * fix test_engine_client.py: comprehensive test improvements * feat: enhance test_engine_client.py with comprehensive test improvements * fix: resolve test failures in test_engine_client.py * test: enhance EngineClient test coverage with comprehensive test suite * test: add comprehensive EngineClient test suite (codestyle checked)	2026-03-02 14:29:23 +08:00
yzwu	6674131b0b	[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553 )	2026-03-02 14:07:17 +08:00
YuBaoku	481d0e385f	[CI] Skip long-sequence case due to potential non-determinism (#6587 )	2026-03-02 11:34:15 +08:00
周周周	d957ccd46d	seq_lens related tensor shape -> [max_num_seqs] (#6535 )	2026-03-02 11:18:30 +08:00
AIbin	59b578c337	[Feature]Supports SWA based on appendattn (#6547 )	2026-03-01 19:02:08 +08:00
Yonghua Li	7cf5e64c7a	[BugFix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend (#6516 ) * [fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend * [fix] fix test_cache_transfer_manager * [fix] fix test_cache_transfer_manager again --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-03-01 13:43:31 +08:00
YuBaoku	bb51829bd5	[CI] Fix tests and docs to resolve failure (#6572 )	2026-03-01 12:33:01 +08:00
0Ayachi0	977e2cc202	[CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 (#6209 ) * [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-02-28 19:29:02 +08:00
zccjjj	a2072fe20c	[XPU] support warmup with ep & remove apply_tp_fused_op (#6289 )	2026-02-28 15:40:36 +08:00
ming1753	97eee75677	[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407 ) * Optim GPU Mem Usage --------- Co-authored-by: huzesen <huzesen@baidu.com>	2026-02-28 15:07:43 +08:00
YuBaoku	54f7d9f621	[CI] Sync mm_batch_invariant with paddle.mm update (#6557 )	2026-02-28 14:56:42 +08:00
YuBaoku	8e67fb422c	[CI] disable test_batch_invariance_op_mm.py in unit_test (#6548 )	2026-02-28 10:16:14 +08:00
xunyoyo	12f754ef38	[CI] 【Hackathon 10th Spring No.42】功能模块 fastdeploy/entrypoints/openai/serving_chat.py单测补充 (#6112 ) * test: expand OpenAI serving chat coverage * Import RequestOutput in test_serving_chat.py * Reorder import statements in test_serving_chat.py * test: fix tool_calls finish_reason case * test: refine serving_chat coverage * test: format serving_chat tests --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-27 16:32:46 +08:00
ZeLong Li	81ea3674b0	[CI] 【Hackathon 10th Spring No.26】功能模块 fastdeploy/utils.py 单测补充 (#6146 ) test (#6146) Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-27 16:28:40 +08:00
xunyoyo	ff61a7f5a1	[CI] 【Hackathon 10th Spring No.40】功能模块 fastdeploy/model_executor/layers/linear.py单测补充 (#6107 ) * Add linear layer tests for model executor * Refine linear layer tests for uncovered branches * Refactor and enhance tests for linear layers Refactor test_linear.py by removing unused imports and redundant code, and updating model configuration parameters. Add new tests for linear layers and their loading mechanisms. * test: patch row-parallel alltoall in unit test * test: avoid alltoall reshape failure in row-parallel * test: expand linear coverage targets * Refine linear tests per review feedback * Fix linear tests for pre-sharded config and qkv fixture --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-27 16:25:23 +08:00
sunxin	53aaac69da	[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457 ) * gate bf16 * add gate-fp32 * fix * update baseline * update * update * fix	2026-02-26 21:08:46 -08:00
gongweibao	edd31e8849	[Feature] Add Deterministic Inference Support (#6476 ) * add * [tests] Add Paddle attention determinism tests and refactor resource manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * add * add * add * add * add more * add more * fixsome * fixsome * fix bugs * fix bugs * only in gpu * add docs * fix comments * fix some * fix some * fix comments * add more * fix potential problem * remove not need * remove not need * remove no need * fix bug * fix bugs * fix comments * fix comments * Update tests/ce/deterministic/test_determinism_verification.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/inter_communicator/test_ipc_signal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/engine/test_sampling_params_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism_standalone.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix comments * fix import error * fix a bug * fix bugs * fix bugs * fix coverage * refine codes * refine code * fix comments * fix comments * fix comments * rm not need * fix allreduce large tensor bug * mv log files * mv log files * add files --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-26 19:31:51 -08:00
zccjjj	c34cb2a8c2	[XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape (#6337 )	2026-02-27 09:55:41 +08:00
kesmeey	bf14ea18aa	tests: fix cache_transfer_manager threading and init mocks (#6502 ) tests: fix cache_transfer_manager threading and init mocks	2026-02-26 17:32:51 +08:00
yinwei	256651e9de	Add PD Cudagraph CI Case	2026-02-26 17:01:20 +08:00
GoldPancake	2178f2829b	[Speculative Decoding] Support suffix decoding (#6403 ) * support suffix decoding	2026-02-26 11:42:05 +08:00
Yuanle Liu	6d3fede240	[OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6493 ) * Initial plan * Migrate PRs #6311, #6129, #6305 to develop and merge unit tests Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix * update * fix * fix ci * fix ci * Initial plan * test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add disable-thinking case to test_chat_with_response_max_tokens Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add both reasoning_max_tokens and response_max_tokens case Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix ci * fix ci * fix ci --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>	2026-02-25 21:36:50 +08:00
YuBaoku	fa8a2e32c8	[CI] Add test for prefix caching L2 swap (#6507 )	2026-02-25 19:56:01 +08:00
jackyYang6	a29ee57e15	[Feature] Support ThinkingBudget Logits processor to control thinking content length (#6367 ) * feat: add thinking budget logits processor * add unittest * fix pre-commit * add unittest * docs: clarify operator-level vs logits processor usage and conflict guidance --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-25 14:17:09 +08:00
Longzhi Wang	22566168c3	[Feature] support qkv&gate linear fusion (#6455 ) * [Feature] support qkv&gate linear fusion * add test	2026-02-24 15:20:29 +08:00
jackyYang6	38c3e02470	fix paddleformers fallback (#6465 )	2026-02-23 15:29:13 +08:00
Yonghua Li	e2332a1112	[BugFix] fix num_cpu_blocks computation (#6438 ) * [BugFix] fix num_cpu_blocks computation * [fix] fix syntax and log * [fix] pre-commit * [fix] use getattr * [fix] ci test	2026-02-13 11:05:14 +08:00
YuBaoku	9d72332aca	[CI] Optimize unittest and fix title format (#6464 ) * [CI] Optimize unit test duration and fix PR title format	2026-02-11 20:48:56 +08:00

1 2 3 4 5 ...

871 Commits