Commit Graph

4670 Commits

Author SHA1 Message Date
mouxin 049c807d86 [Docs] Update the document (#6539)
Co-authored-by: mouxin <mouxin@baidu.com>
2026-02-27 19:21:10 +08:00
cmcamdy 13447279aa [XPU] Fix PD + MTP (#6495)
* fix pd + mtp

* fix code style

* fix PD + MTP, D get P's first token

* add anno for gpu(speculate_update)

* update draft insertv1

* fix wapper & kernel

* fix wapper

* fix code stype
2026-02-27 19:07:35 +08:00
xunyoyo 12f754ef38 [CI] 【Hackathon 10th Spring No.42】功能模块 fastdeploy/entrypoints/openai/serving_chat.py单测补充 (#6112)
* test: expand OpenAI serving chat coverage

* Import RequestOutput in test_serving_chat.py

* Reorder import statements in test_serving_chat.py

* test: fix tool_calls finish_reason case

* test: refine serving_chat coverage

* test: format serving_chat tests

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-27 16:32:46 +08:00
ZeLong Li 81ea3674b0 [CI] 【Hackathon 10th Spring No.26】功能模块 fastdeploy/utils.py 单测补充 (#6146)
test (#6146)
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-27 16:28:40 +08:00
xunyoyo ff61a7f5a1 [CI] 【Hackathon 10th Spring No.40】功能模块 fastdeploy/model_executor/layers/linear.py单测补充 (#6107)
* Add linear layer tests for model executor

* Refine linear layer tests for uncovered branches

* Refactor and enhance tests for linear layers

Refactor test_linear.py by removing unused imports and redundant code, and updating model configuration parameters. Add new tests for linear layers and their loading mechanisms.

* test: patch row-parallel alltoall in unit test

* test: avoid alltoall reshape failure in row-parallel

* test: expand linear coverage targets

* Refine linear tests per review feedback

* Fix linear tests for pre-sharded config and qkv fixture

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-27 16:25:23 +08:00
JYChen c6d8fbe526 [BugFix] fix log with paddlefleet.ops (#6528) 2026-02-27 14:34:29 +08:00
周周周 1503443871 add dsv3 mixed deploy as EP16 TP8 (#6525) 2026-02-27 14:08:25 +08:00
luukunn 16de778343 update FD_USAGE_STATS_SERVER (#6524) 2026-02-27 13:28:57 +08:00
sunxin 53aaac69da [Optimization] Enable BF16 gate computation for GLM and Qwen (#6457)
* gate bf16

* add gate-fp32

* fix

* update baseline

* update

* update

* fix
2026-02-26 21:08:46 -08:00
gongweibao edd31e8849 [Feature] Add Deterministic Inference Support (#6476)
* add

* [tests] Add Paddle attention determinism tests and refactor resource manager

Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* add

* add

* add

* add

* add more

* add more

* fixsome

* fixsome

* fix bugs

* fix bugs

* only in gpu

* add docs

* fix comments

* fix some

* fix some

* fix comments

* add more

* fix potential problem

* remove not need

* remove not need

* remove no need

* fix bug

* fix bugs

* fix comments

* fix comments

* Update tests/ce/deterministic/test_determinism_verification.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/inter_communicator/test_ipc_signal.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/engine/test_sampling_params_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism_standalone.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix comments

* fix import error

* fix a bug

* fix bugs

* fix bugs

* fix coverage

* refine codes

* refine code

* fix comments

* fix comments

* fix comments

* rm not need

* fix allreduce large tensor bug

* mv log files

* mv log files

* add files

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-26 19:31:51 -08:00
zccjjj c34cb2a8c2 [XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape (#6337) 2026-02-27 09:55:41 +08:00
jc 7b1d787b4b [BugFix] Fix storage_backend_type comparison bug in cache_transfer_manager.py (#6514)
Co-authored-by: root <root@tjzj-inf-sci-k8s-hzz1-h62ni7-2178.tjzj.baidu.com>
2026-02-26 19:32:24 +08:00
MingkunZhang c369f7139f [Metax][Fix] fix error based pr #6493 (#6521) 2026-02-26 18:41:35 +08:00
chen 2d1531f3cb dev opensource model support fa4/flashmasV2/V3 (#6518) 2026-02-26 17:46:05 +08:00
kesmeey bf14ea18aa tests: fix cache_transfer_manager threading and init mocks (#6502)
tests: fix cache_transfer_manager threading and init mocks
2026-02-26 17:32:51 +08:00
Zhang Yulong ff20a3cc02 [benchmark] update tool call (#6519) 2026-02-26 17:06:54 +08:00
yinwei 256651e9de Add PD Cudagraph CI Case 2026-02-26 17:01:20 +08:00
gongweibao 2541462f7e [Feature][Docs] Add Python-only quick install mode (BUILD_WHEEL=2) to build.sh (#6503)
* add pythononly func

* add

* add more feature

* add safe check

* add rsync check

* add

* add

* refine docs

* add installation

* add installation
2026-02-26 16:17:41 +08:00
AIbin 47bfd45bb6 [Docs]add deepseek model doc (#6513)
* add deepseek model doc
2026-02-26 14:08:19 +08:00
MingkunZhang b56a4099c0 [Metax][Docs] update metax guidance documents (#6515) 2026-02-26 14:04:23 +08:00
GoldPancake 2178f2829b [Speculative Decoding] Support suffix decoding (#6403)
* support suffix decoding
2026-02-26 11:42:05 +08:00
Yuanle Liu 6d3fede240 [OP][Feature] 统一 limit_thinking_content_length CUDA 算子,支持回复长度限制与注入序列 (#6493)
* Initial plan

* Migrate PRs #6311, #6129, #6305 to develop and merge unit tests

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* fix

* update

* fix

* fix ci

* fix ci

* Initial plan

* test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* test: add disable-thinking case to test_chat_with_response_max_tokens

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* test: add both reasoning_max_tokens and response_max_tokens case

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* fix ci

* fix ci

* fix ci

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
2026-02-25 21:36:50 +08:00
YuBaoku e18397134a [Others] Update FASTDEPLOY_VERSION to 2.5.0-dev 2026-02-25 20:12:09 +08:00
YuBaoku fa8a2e32c8 [CI] Add test for prefix caching L2 swap (#6507) 2026-02-25 19:56:01 +08:00
zhupengyang a303eacf62 [XPU] support norm before rope (#6475) 2026-02-25 18:43:44 +08:00
Yuqiang Ge 1f931e05cd [CI] Add retry logic for pip install in iluvatar CI script (#6500) 2026-02-25 16:01:41 +08:00
Wanglongzhi2001 14ea7243e1 [Feature] support mm_processor_kwargs for flexible model 2026-02-25 14:34:33 +08:00
jackyYang6 a29ee57e15 [Feature] Support ThinkingBudget Logits processor to control thinking content length (#6367)
* feat: add thinking budget logits processor

* add unittest

* fix pre-commit

* add unittest

* docs: clarify operator-level vs logits processor usage and conflict guidance

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-25 14:17:09 +08:00
YuBaoku 1405d7d5d7 [CI] Pin gunicorn version to 25.0.3 (#6497) 2026-02-25 09:52:22 +08:00
Longzhi Wang 22566168c3 [Feature] support qkv&gate linear fusion (#6455)
* [Feature] support qkv&gate linear fusion

* add test
2026-02-24 15:20:29 +08:00
jackyYang6 38c3e02470 fix paddleformers fallback (#6465) 2026-02-23 15:29:13 +08:00
Yonghua Li e2332a1112 [BugFix] fix num_cpu_blocks computation (#6438)
* [BugFix] fix num_cpu_blocks computation

* [fix] fix syntax and log

* [fix] pre-commit

* [fix] use getattr

* [fix] ci test
2026-02-13 11:05:14 +08:00
kevin 52edf5e9b3 fix mtp acceptance rate decline (#6470) 2026-02-12 19:56:10 +08:00
sunxin 51f812aaa4 fix empty get_padding_offset (#6462) 2026-02-12 12:34:23 +08:00
AIbin 0eb87467f8 [BugFix]fix RL bug about blockwisefp8 (#6466)
* fix RL bug about blockwisefp8

* fix  moe same bug

* fix RL FP8 bug
2026-02-12 09:15:29 +08:00
YuBaoku 9d72332aca [CI] Optimize unittest and fix title format (#6464)
* [CI] Optimize unit test duration and fix PR title format
2026-02-11 20:48:56 +08:00
Divano ba3b142ff7 [Others] add objgraph to test out of memory (#6456) 2026-02-11 20:17:20 +08:00
Zhang Yulong 96bfa0d5b9 [benchmark] Update benchmark_serving.py (#6467) 2026-02-11 20:10:46 +08:00
JYChen 40c952e7b5 fix deepgemm import (#6451)
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-02-11 20:10:01 +08:00
Jiaxin Sui e40fb16912 Revert "[XPU] change base XPU docker image (#6411)" (#6427)
This reverts commit 32bd40a192.
2026-02-11 16:31:54 +08:00
zhupengyang 4a8c54926b [XPU] topk_method=noaux_tc (#6355) 2026-02-11 16:12:20 +08:00
kesmeey e4e3a71e7b [CI] 【Hackathon 10th Spring No.22】功能模块 fastdeploy/cache_manager/cache_transfer_manager.py 单测补充 (#6157)
* Add comprehensive test coverage for cache_transfer_manager.py

* Fix code style: add newline at end of file

* fix: update cache transfer manager tests for branch 22 interface changes

* fix: resolve test errors for cache transfer manager

* fix: update cache transfer manager tests for branch 22 interface changes

* style: apply pre-commit formatting to tests/cache_manager/test_cache_transfer_manager.py

* Run codestyle: format tests/cache_manager/test_cache_transfer_manager.py and related fixes

* Update test_cache_transfer_manager.py

* Format cache transfer manager tests

* Update cache transfer manager tests

* Update unit test coverage workflow

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-11 11:23:57 +08:00
CSWYF3634076 7380bfb476 [BugFix]fix console log metrics waitting queue count (#6432)
* [BugFix]fix console log metrics waitting queue count

* [BugFix]fix console log metrics waitting queue count unittest
2026-02-11 10:51:49 +08:00
YuBaoku 390d0f2d77 [CI] Fix cherry-pick automation (#6448)
* [CI] Fix cherry-pick automation
2026-02-10 22:45:29 +08:00
YuBaoku a918738b8f [CI] Optimize cherry-pick automation (#6445) 2026-02-10 21:48:13 +08:00
Jiang-Jia-Jun 19849a0e9b Fix formatting in README_EN.md for v2.3 release 2026-02-10 20:32:15 +08:00
Jiang-Jia-Jun 3f9fcec8bd Update FastDeploy release notes in README_CN.md 2026-02-10 20:32:03 +08:00
Jiang-Jia-Jun a54b92448b Update README for version 2.4 2026-02-10 20:28:17 +08:00
Jiang-Jia-Jun 9d1fb17dc8 Update README_EN.md 2026-02-10 20:19:06 +08:00
Jiang-Jia-Jun f7e1b9355e Update README_EN.md 2026-02-10 20:18:04 +08:00