FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
mouxin	049c807d86	[Docs] Update the document (#6539 ) Co-authored-by: mouxin <mouxin@baidu.com>	2026-02-27 19:21:10 +08:00
cmcamdy	13447279aa	[XPU] Fix PD + MTP (#6495 ) * fix pd + mtp * fix code style * fix PD + MTP, D get P's first token * add anno for gpu(speculate_update) * update draft insertv1 * fix wapper & kernel * fix wapper * fix code stype	2026-02-27 19:07:35 +08:00
xunyoyo	12f754ef38	[CI] 【Hackathon 10th Spring No.42】功能模块 fastdeploy/entrypoints/openai/serving_chat.py单测补充 (#6112 ) * test: expand OpenAI serving chat coverage * Import RequestOutput in test_serving_chat.py * Reorder import statements in test_serving_chat.py * test: fix tool_calls finish_reason case * test: refine serving_chat coverage * test: format serving_chat tests --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-27 16:32:46 +08:00
ZeLong Li	81ea3674b0	[CI] 【Hackathon 10th Spring No.26】功能模块 fastdeploy/utils.py 单测补充 (#6146 ) test (#6146) Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-27 16:28:40 +08:00
xunyoyo	ff61a7f5a1	[CI] 【Hackathon 10th Spring No.40】功能模块 fastdeploy/model_executor/layers/linear.py单测补充 (#6107 ) * Add linear layer tests for model executor * Refine linear layer tests for uncovered branches * Refactor and enhance tests for linear layers Refactor test_linear.py by removing unused imports and redundant code, and updating model configuration parameters. Add new tests for linear layers and their loading mechanisms. * test: patch row-parallel alltoall in unit test * test: avoid alltoall reshape failure in row-parallel * test: expand linear coverage targets * Refine linear tests per review feedback * Fix linear tests for pre-sharded config and qkv fixture --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-27 16:25:23 +08:00
JYChen	c6d8fbe526	[BugFix] fix log with paddlefleet.ops (#6528 )	2026-02-27 14:34:29 +08:00
周周周	1503443871	add dsv3 mixed deploy as EP16 TP8 (#6525 )	2026-02-27 14:08:25 +08:00
luukunn	16de778343	update FD_USAGE_STATS_SERVER (#6524 )	2026-02-27 13:28:57 +08:00
sunxin	53aaac69da	[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457 ) * gate bf16 * add gate-fp32 * fix * update baseline * update * update * fix	2026-02-26 21:08:46 -08:00
gongweibao	edd31e8849	[Feature] Add Deterministic Inference Support (#6476 ) * add * [tests] Add Paddle attention determinism tests and refactor resource manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * add * add * add * add * add more * add more * fixsome * fixsome * fix bugs * fix bugs * only in gpu * add docs * fix comments * fix some * fix some * fix comments * add more * fix potential problem * remove not need * remove not need * remove no need * fix bug * fix bugs * fix comments * fix comments * Update tests/ce/deterministic/test_determinism_verification.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/inter_communicator/test_ipc_signal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/engine/test_sampling_params_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism_standalone.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix comments * fix import error * fix a bug * fix bugs * fix bugs * fix coverage * refine codes * refine code * fix comments * fix comments * fix comments * rm not need * fix allreduce large tensor bug * mv log files * mv log files * add files --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-26 19:31:51 -08:00
zccjjj	c34cb2a8c2	[XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape (#6337 )	2026-02-27 09:55:41 +08:00
jc	7b1d787b4b	[BugFix] Fix storage_backend_type comparison bug in cache_transfer_manager.py (#6514 ) Co-authored-by: root <root@tjzj-inf-sci-k8s-hzz1-h62ni7-2178.tjzj.baidu.com>	2026-02-26 19:32:24 +08:00
MingkunZhang	c369f7139f	[Metax][Fix] fix error based pr #6493 (#6521 )	2026-02-26 18:41:35 +08:00
chen	2d1531f3cb	dev opensource model support fa4/flashmasV2/V3 (#6518 )	2026-02-26 17:46:05 +08:00
kesmeey	bf14ea18aa	tests: fix cache_transfer_manager threading and init mocks (#6502 ) tests: fix cache_transfer_manager threading and init mocks	2026-02-26 17:32:51 +08:00
Zhang Yulong	ff20a3cc02	[benchmark] update tool call (#6519 )	2026-02-26 17:06:54 +08:00
yinwei	256651e9de	Add PD Cudagraph CI Case	2026-02-26 17:01:20 +08:00
gongweibao	2541462f7e	[Feature][Docs] Add Python-only quick install mode (BUILD_WHEEL=2) to build.sh (#6503 ) * add pythononly func * add * add more feature * add safe check * add rsync check * add * add * refine docs * add installation * add installation	2026-02-26 16:17:41 +08:00
AIbin	47bfd45bb6	[Docs]add deepseek model doc (#6513 ) * add deepseek model doc	2026-02-26 14:08:19 +08:00
MingkunZhang	b56a4099c0	[Metax][Docs] update metax guidance documents (#6515 )	2026-02-26 14:04:23 +08:00
GoldPancake	2178f2829b	[Speculative Decoding] Support suffix decoding (#6403 ) * support suffix decoding	2026-02-26 11:42:05 +08:00
Yuanle Liu	6d3fede240	[OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6493 ) * Initial plan * Migrate PRs #6311, #6129, #6305 to develop and merge unit tests Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix * update * fix * fix ci * fix ci * Initial plan * test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add disable-thinking case to test_chat_with_response_max_tokens Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add both reasoning_max_tokens and response_max_tokens case Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix ci * fix ci * fix ci --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>	2026-02-25 21:36:50 +08:00
YuBaoku	e18397134a	[Others] Update FASTDEPLOY_VERSION to 2.5.0-dev	2026-02-25 20:12:09 +08:00
YuBaoku	fa8a2e32c8	[CI] Add test for prefix caching L2 swap (#6507 )	2026-02-25 19:56:01 +08:00
zhupengyang	a303eacf62	[XPU] support norm before rope (#6475 )	2026-02-25 18:43:44 +08:00
Yuqiang Ge	1f931e05cd	[CI] Add retry logic for pip install in iluvatar CI script (#6500 )	2026-02-25 16:01:41 +08:00
Wanglongzhi2001	14ea7243e1	[Feature] support mm_processor_kwargs for flexible model	2026-02-25 14:34:33 +08:00
jackyYang6	a29ee57e15	[Feature] Support ThinkingBudget Logits processor to control thinking content length (#6367 ) * feat: add thinking budget logits processor * add unittest * fix pre-commit * add unittest * docs: clarify operator-level vs logits processor usage and conflict guidance --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-25 14:17:09 +08:00
YuBaoku	1405d7d5d7	[CI] Pin gunicorn version to 25.0.3 (#6497 )	2026-02-25 09:52:22 +08:00
Longzhi Wang	22566168c3	[Feature] support qkv&gate linear fusion (#6455 ) * [Feature] support qkv&gate linear fusion * add test	2026-02-24 15:20:29 +08:00
jackyYang6	38c3e02470	fix paddleformers fallback (#6465 )	2026-02-23 15:29:13 +08:00
Yonghua Li	e2332a1112	[BugFix] fix num_cpu_blocks computation (#6438 ) * [BugFix] fix num_cpu_blocks computation * [fix] fix syntax and log * [fix] pre-commit * [fix] use getattr * [fix] ci test	2026-02-13 11:05:14 +08:00
kevin	52edf5e9b3	fix mtp acceptance rate decline (#6470 )	2026-02-12 19:56:10 +08:00
sunxin	51f812aaa4	fix empty get_padding_offset (#6462 )	2026-02-12 12:34:23 +08:00
AIbin	0eb87467f8	[BugFix]fix RL bug about blockwisefp8 (#6466 ) * fix RL bug about blockwisefp8 * fix moe same bug * fix RL FP8 bug	2026-02-12 09:15:29 +08:00
YuBaoku	9d72332aca	[CI] Optimize unittest and fix title format (#6464 ) * [CI] Optimize unit test duration and fix PR title format	2026-02-11 20:48:56 +08:00
Divano	ba3b142ff7	[Others] add objgraph to test out of memory (#6456 )	2026-02-11 20:17:20 +08:00
Zhang Yulong	96bfa0d5b9	[benchmark] Update benchmark_serving.py (#6467 )	2026-02-11 20:10:46 +08:00
JYChen	40c952e7b5	fix deepgemm import (#6451 ) Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-02-11 20:10:01 +08:00
Jiaxin Sui	e40fb16912	Revert "[XPU] change base XPU docker image (#6411 )" (#6427 ) This reverts commit `32bd40a192`.	2026-02-11 16:31:54 +08:00
zhupengyang	4a8c54926b	[XPU] topk_method=noaux_tc (#6355 )	2026-02-11 16:12:20 +08:00
kesmeey	e4e3a71e7b	[CI] 【Hackathon 10th Spring No.22】功能模块 fastdeploy/cache_manager/cache_transfer_manager.py 单测补充 (#6157 ) * Add comprehensive test coverage for cache_transfer_manager.py * Fix code style: add newline at end of file * fix: update cache transfer manager tests for branch 22 interface changes * fix: resolve test errors for cache transfer manager * fix: update cache transfer manager tests for branch 22 interface changes * style: apply pre-commit formatting to tests/cache_manager/test_cache_transfer_manager.py * Run codestyle: format tests/cache_manager/test_cache_transfer_manager.py and related fixes * Update test_cache_transfer_manager.py * Format cache transfer manager tests * Update cache transfer manager tests * Update unit test coverage workflow --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-11 11:23:57 +08:00
CSWYF3634076	7380bfb476	[BugFix]fix console log metrics waitting queue count (#6432 ) * [BugFix]fix console log metrics waitting queue count * [BugFix]fix console log metrics waitting queue count unittest	2026-02-11 10:51:49 +08:00
YuBaoku	390d0f2d77	[CI] Fix cherry-pick automation (#6448 ) * [CI] Fix cherry-pick automation	2026-02-10 22:45:29 +08:00
YuBaoku	a918738b8f	[CI] Optimize cherry-pick automation (#6445 )	2026-02-10 21:48:13 +08:00
Jiang-Jia-Jun	19849a0e9b	Fix formatting in README_EN.md for v2.3 release	2026-02-10 20:32:15 +08:00
Jiang-Jia-Jun	3f9fcec8bd	Update FastDeploy release notes in README_CN.md	2026-02-10 20:32:03 +08:00
Jiang-Jia-Jun	a54b92448b	Update README for version 2.4	2026-02-10 20:28:17 +08:00
Jiang-Jia-Jun	9d1fb17dc8	Update README_EN.md	2026-02-10 20:19:06 +08:00
Jiang-Jia-Jun	f7e1b9355e	Update README_EN.md	2026-02-10 20:18:04 +08:00

1 2 3 4 5 ...

4670 Commits