mouxin
049c807d86
[Docs] Update the document ( #6539 )
...
Co-authored-by: mouxin <mouxin@baidu.com >
2026-02-27 19:21:10 +08:00
cmcamdy
13447279aa
[XPU] Fix PD + MTP ( #6495 )
...
* fix pd + mtp
* fix code style
* fix PD + MTP, D get P's first token
* add anno for gpu(speculate_update)
* update draft insertv1
* fix wapper & kernel
* fix wapper
* fix code stype
2026-02-27 19:07:35 +08:00
xunyoyo
12f754ef38
[CI] 【Hackathon 10th Spring No.42】功能模块 fastdeploy/entrypoints/openai/serving_chat.py单测补充 ( #6112 )
...
* test: expand OpenAI serving chat coverage
* Import RequestOutput in test_serving_chat.py
* Reorder import statements in test_serving_chat.py
* test: fix tool_calls finish_reason case
* test: refine serving_chat coverage
* test: format serving_chat tests
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-27 16:32:46 +08:00
ZeLong Li
81ea3674b0
[CI] 【Hackathon 10th Spring No.26】功能模块 fastdeploy/utils.py 单测补充 ( #6146 )
...
test (#6146 )
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-27 16:28:40 +08:00
xunyoyo
ff61a7f5a1
[CI] 【Hackathon 10th Spring No.40】功能模块 fastdeploy/model_executor/layers/linear.py单测补充 ( #6107 )
...
* Add linear layer tests for model executor
* Refine linear layer tests for uncovered branches
* Refactor and enhance tests for linear layers
Refactor test_linear.py by removing unused imports and redundant code, and updating model configuration parameters. Add new tests for linear layers and their loading mechanisms.
* test: patch row-parallel alltoall in unit test
* test: avoid alltoall reshape failure in row-parallel
* test: expand linear coverage targets
* Refine linear tests per review feedback
* Fix linear tests for pre-sharded config and qkv fixture
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-27 16:25:23 +08:00
JYChen
c6d8fbe526
[BugFix] fix log with paddlefleet.ops ( #6528 )
2026-02-27 14:34:29 +08:00
周周周
1503443871
add dsv3 mixed deploy as EP16 TP8 ( #6525 )
2026-02-27 14:08:25 +08:00
luukunn
16de778343
update FD_USAGE_STATS_SERVER ( #6524 )
2026-02-27 13:28:57 +08:00
sunxin
53aaac69da
[Optimization] Enable BF16 gate computation for GLM and Qwen ( #6457 )
...
* gate bf16
* add gate-fp32
* fix
* update baseline
* update
* update
* fix
2026-02-26 21:08:46 -08:00
gongweibao
edd31e8849
[Feature] Add Deterministic Inference Support ( #6476 )
...
* add
* [tests] Add Paddle attention determinism tests and refactor resource manager
Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
* add
* add
* add
* add
* add more
* add more
* fixsome
* fixsome
* fix bugs
* fix bugs
* only in gpu
* add docs
* fix comments
* fix some
* fix some
* fix comments
* add more
* fix potential problem
* remove not need
* remove not need
* remove no need
* fix bug
* fix bugs
* fix comments
* fix comments
* Update tests/ce/deterministic/test_determinism_verification.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/inter_communicator/test_ipc_signal.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/engine/test_sampling_params_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism_standalone.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix comments
* fix import error
* fix a bug
* fix bugs
* fix bugs
* fix coverage
* refine codes
* refine code
* fix comments
* fix comments
* fix comments
* rm not need
* fix allreduce large tensor bug
* mv log files
* mv log files
* add files
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-02-26 19:31:51 -08:00
zccjjj
c34cb2a8c2
[XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape ( #6337 )
2026-02-27 09:55:41 +08:00
jc
7b1d787b4b
[BugFix] Fix storage_backend_type comparison bug in cache_transfer_manager.py ( #6514 )
...
Co-authored-by: root <root@tjzj-inf-sci-k8s-hzz1-h62ni7-2178.tjzj.baidu.com >
2026-02-26 19:32:24 +08:00
MingkunZhang
c369f7139f
[Metax][Fix] fix error based pr #6493 ( #6521 )
2026-02-26 18:41:35 +08:00
chen
2d1531f3cb
dev opensource model support fa4/flashmasV2/V3 ( #6518 )
2026-02-26 17:46:05 +08:00
kesmeey
bf14ea18aa
tests: fix cache_transfer_manager threading and init mocks ( #6502 )
...
tests: fix cache_transfer_manager threading and init mocks
2026-02-26 17:32:51 +08:00
Zhang Yulong
ff20a3cc02
[benchmark] update tool call ( #6519 )
2026-02-26 17:06:54 +08:00
yinwei
256651e9de
Add PD Cudagraph CI Case
2026-02-26 17:01:20 +08:00
gongweibao
2541462f7e
[Feature][Docs] Add Python-only quick install mode (BUILD_WHEEL=2) to build.sh ( #6503 )
...
* add pythononly func
* add
* add more feature
* add safe check
* add rsync check
* add
* add
* refine docs
* add installation
* add installation
2026-02-26 16:17:41 +08:00
AIbin
47bfd45bb6
[Docs]add deepseek model doc ( #6513 )
...
* add deepseek model doc
2026-02-26 14:08:19 +08:00
MingkunZhang
b56a4099c0
[Metax][Docs] update metax guidance documents ( #6515 )
2026-02-26 14:04:23 +08:00
GoldPancake
2178f2829b
[Speculative Decoding] Support suffix decoding ( #6403 )
...
* support suffix decoding
2026-02-26 11:42:05 +08:00
Yuanle Liu
6d3fede240
[OP][Feature] 统一 limit_thinking_content_length CUDA 算子,支持回复长度限制与注入序列 ( #6493 )
...
* Initial plan
* Migrate PRs #6311 , #6129 , #6305 to develop and merge unit tests
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* fix
* update
* fix
* fix ci
* fix ci
* Initial plan
* test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* test: add disable-thinking case to test_chat_with_response_max_tokens
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* test: add both reasoning_max_tokens and response_max_tokens case
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* fix ci
* fix ci
* fix ci
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
2026-02-25 21:36:50 +08:00
YuBaoku
e18397134a
[Others] Update FASTDEPLOY_VERSION to 2.5.0-dev
2026-02-25 20:12:09 +08:00
YuBaoku
fa8a2e32c8
[CI] Add test for prefix caching L2 swap ( #6507 )
2026-02-25 19:56:01 +08:00
zhupengyang
a303eacf62
[XPU] support norm before rope ( #6475 )
2026-02-25 18:43:44 +08:00
Yuqiang Ge
1f931e05cd
[CI] Add retry logic for pip install in iluvatar CI script ( #6500 )
2026-02-25 16:01:41 +08:00
Wanglongzhi2001
14ea7243e1
[Feature] support mm_processor_kwargs for flexible model
2026-02-25 14:34:33 +08:00
jackyYang6
a29ee57e15
[Feature] Support ThinkingBudget Logits processor to control thinking content length ( #6367 )
...
* feat: add thinking budget logits processor
* add unittest
* fix pre-commit
* add unittest
* docs: clarify operator-level vs logits processor usage and conflict guidance
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-25 14:17:09 +08:00
YuBaoku
1405d7d5d7
[CI] Pin gunicorn version to 25.0.3 ( #6497 )
2026-02-25 09:52:22 +08:00
Longzhi Wang
22566168c3
[Feature] support qkv&gate linear fusion ( #6455 )
...
* [Feature] support qkv&gate linear fusion
* add test
2026-02-24 15:20:29 +08:00
jackyYang6
38c3e02470
fix paddleformers fallback ( #6465 )
2026-02-23 15:29:13 +08:00
Yonghua Li
e2332a1112
[BugFix] fix num_cpu_blocks computation ( #6438 )
...
* [BugFix] fix num_cpu_blocks computation
* [fix] fix syntax and log
* [fix] pre-commit
* [fix] use getattr
* [fix] ci test
2026-02-13 11:05:14 +08:00
kevin
52edf5e9b3
fix mtp acceptance rate decline ( #6470 )
2026-02-12 19:56:10 +08:00
sunxin
51f812aaa4
fix empty get_padding_offset ( #6462 )
2026-02-12 12:34:23 +08:00
AIbin
0eb87467f8
[BugFix]fix RL bug about blockwisefp8 ( #6466 )
...
* fix RL bug about blockwisefp8
* fix moe same bug
* fix RL FP8 bug
2026-02-12 09:15:29 +08:00
YuBaoku
9d72332aca
[CI] Optimize unittest and fix title format ( #6464 )
...
* [CI] Optimize unit test duration and fix PR title format
2026-02-11 20:48:56 +08:00
Divano
ba3b142ff7
[Others] add objgraph to test out of memory ( #6456 )
2026-02-11 20:17:20 +08:00
Zhang Yulong
96bfa0d5b9
[benchmark] Update benchmark_serving.py ( #6467 )
2026-02-11 20:10:46 +08:00
JYChen
40c952e7b5
fix deepgemm import ( #6451 )
...
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-02-11 20:10:01 +08:00
Jiaxin Sui
e40fb16912
Revert "[XPU] change base XPU docker image ( #6411 )" ( #6427 )
...
This reverts commit 32bd40a192 .
2026-02-11 16:31:54 +08:00
zhupengyang
4a8c54926b
[XPU] topk_method=noaux_tc ( #6355 )
2026-02-11 16:12:20 +08:00
kesmeey
e4e3a71e7b
[CI] 【Hackathon 10th Spring No.22】功能模块 fastdeploy/cache_manager/cache_transfer_manager.py 单测补充 ( #6157 )
...
* Add comprehensive test coverage for cache_transfer_manager.py
* Fix code style: add newline at end of file
* fix: update cache transfer manager tests for branch 22 interface changes
* fix: resolve test errors for cache transfer manager
* fix: update cache transfer manager tests for branch 22 interface changes
* style: apply pre-commit formatting to tests/cache_manager/test_cache_transfer_manager.py
* Run codestyle: format tests/cache_manager/test_cache_transfer_manager.py and related fixes
* Update test_cache_transfer_manager.py
* Format cache transfer manager tests
* Update cache transfer manager tests
* Update unit test coverage workflow
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-11 11:23:57 +08:00
CSWYF3634076
7380bfb476
[BugFix]fix console log metrics waitting queue count ( #6432 )
...
* [BugFix]fix console log metrics waitting queue count
* [BugFix]fix console log metrics waitting queue count unittest
2026-02-11 10:51:49 +08:00
YuBaoku
390d0f2d77
[CI] Fix cherry-pick automation ( #6448 )
...
* [CI] Fix cherry-pick automation
2026-02-10 22:45:29 +08:00
YuBaoku
a918738b8f
[CI] Optimize cherry-pick automation ( #6445 )
2026-02-10 21:48:13 +08:00
Jiang-Jia-Jun
19849a0e9b
Fix formatting in README_EN.md for v2.3 release
2026-02-10 20:32:15 +08:00
Jiang-Jia-Jun
3f9fcec8bd
Update FastDeploy release notes in README_CN.md
2026-02-10 20:32:03 +08:00
Jiang-Jia-Jun
a54b92448b
Update README for version 2.4
2026-02-10 20:28:17 +08:00
Jiang-Jia-Jun
9d1fb17dc8
Update README_EN.md
2026-02-10 20:19:06 +08:00
Jiang-Jia-Jun
f7e1b9355e
Update README_EN.md
2026-02-10 20:18:04 +08:00