YuBaoku
596519831c
[CI] Temporarily disable test_determinism_offline.py
2026-03-10 16:54:30 +08:00
YuBaoku
73de8b9795
[CI] Update test_determinism_long.py to reduce execution time
2026-03-10 11:34:36 +08:00
周周周
3897a0b4fc
nvfp4 clean code ( #6671 )
2026-03-09 18:00:34 +08:00
0Ayachi0
0c69cdf56e
[CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 ( #6208 )
...
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充
* update test_fused_moe_triton_backend.py
* fix: apply code style formatting
* Merge branch 'develop' into develop
* Merge branch 'develop' into develop
* Merge branch 'develop' into develop
* Merge branch 'develop' into develop
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-03-09 14:24:08 +08:00
gongweibao
30f9f33f34
[Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kernel-level Fixes and Batch-invariant BMM ( #6610 )
...
* add fa deter
* add ut
* add long sentence
* fix basic
* fix bugs
* fix adn
* fix first
* fix single
* fix single
* fix single test
* refine
* add more test
* refine comments
* add comments of bmm
* fix ci
* remove probe
* add
* remove not need
* refine tests
* fix comments and refine code
* refine code
* refine test
* refine test
* mv 4cards tests
* fix tests
* add
* fix comments
* fix cover
* fix cover
---------
Co-authored-by: gongweibao <gognweibao@baidu.com >
2026-03-09 10:27:53 +08:00
ddchenhao66
3c0ff20328
[BugFix] fix incorrect function parameters of start_data_parallel_service ( #6674 )
2026-03-09 10:15:50 +08:00
YuBaoku
cbfdf42628
[CI] Add test_dynamic_c8_cache.py and latest FastDeploy.tar.gz upload ( #6708 )
2026-03-08 16:01:12 +08:00
gongweibao
1e49855b0f
[BugFix][DataProcessor] Add validate_model_path to fail fast on bad model path or unreachable network ( #6713 )
...
* fix
* add more endpoint
* fix some
---------
Co-authored-by: gongweibao <gognweibao@baidu.com >
2026-03-08 12:36:32 +08:00
luukunn
aac1484b0d
[Feature]add arguments string in tool ( #6704 )
...
* add arguments string
2026-03-06 20:45:09 +08:00
luukunn
caf73e8131
[Feature]add reasoning effort ( #6656 )
...
* add reasoning_effort
* fix log
* fix reasoning_effort
* add reasoning_effort level
* fix valid_parameters
* fix valid_parameters
* fix
* fix unit test
* add unit test
* add unit test
2026-03-06 14:16:02 +08:00
yzwu
81acdb62bd
[Iluvatar][CI] Do not specify FD_LOG_DIR ( #6665 )
2026-03-06 11:54:44 +08:00
YuBaoku
16a393e90e
[CI] Fix non-deterministic test and skip failed_tests.log in log print ( #6672 )
2026-03-05 18:47:18 +08:00
sunxin
0dc7034ce0
[Model Runner] Deprecate not_need_stop ( #6356 )
...
* Deprecate not_need_stop
2026-03-05 10:55:42 +08:00
ddchenhao66
fa4815b93a
[BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server ( #6598 )
...
* [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server
* [BugFix] modify request_queue and result_queue of dp scheduler
2026-03-05 10:04:12 +08:00
YuBaoku
56ceeda80c
[CI] Adjust model-specific diff threshold and include iluvatar XPU paths in coverage ( #6663 )
2026-03-05 10:02:54 +08:00
ming1753
02d32eea3b
Revert "[Bug Fix] Fix MM mtp incorrect rope emb ( #6581 )" ( #6631 )
...
This reverts commit c5eb6b65e7 .
2026-03-04 11:23:28 +08:00
kesmeey
3d3221e24e
[CI] 【Hackathon 10th Spring No.31】功能模块 fastdeploy/model_executor/layers/sample/sampler.py单测补充 ( #6200 )
...
* Format code with black
* Format sampler tests
* update
* update
2026-03-04 10:57:37 +08:00
YuBaoku
c3d6d706d5
[CI] Add nightly workflow for golang_router tests and improve log handling ( #6608 )
...
* [CI] Add nightly workflow for Golang router tests
* [CI] Improve pytest script stability and log handling
2026-03-03 19:36:57 +08:00
ming1753
c5eb6b65e7
[Bug Fix] Fix MM mtp incorrect rope emb ( #6581 )
...
* [Bug Fix] Fix MM mtp incorrect rope emb
2026-03-03 19:28:59 +08:00
qwes5s5
375b5b7b21
[Feature]Log Format Normalization and Trace Log Optimization ( #6370 )
...
* log refactor
* log refactor 2
* log refactor 3
2026-03-03 11:31:45 +08:00
huicongyao
0f718baaf2
[Speculative Decoding]Reformat input preprocess for spec decode ( #6501 )
...
* add speculate_pre_process kernel
* reduce one slice
* make d2h async && fix mtp bug for new pre_process
* fix
* add unitest
* fix: code stype formatting
* fix
* fix: thread race in speculate_preprocess && rename d2h event
2026-03-03 10:22:07 +08:00
kesmeey
aae87e6ae2
[CI] 【Hackathon 10th Spring No.27】功能模块 fastdeploy/cache_manager/prefix_cache_manager.py单测补充 ( #6297 )
...
* test: update prefix cache manager tests
* test: refine prefix cache manager coverage helpers
* style: apply black formatting to test_prefix_cache_manager.py
Co-authored-by: Cursor <cursoragent@cursor.com >
* tests: update test_prefix_cache_manager
Co-authored-by: Cursor <cursoragent@cursor.com >
* update
---------
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-03-02 20:04:12 +08:00
kesmeey
758770bc43
[CI] 【Hackathon 10th Spring No.28】功能模块 fastdeploy/entrypoints/engine_client.py 单测补充 ( #6158 )
...
* fix codestyle and update unit test coverage workflow
* fix test_engine_client.py: add main_process_metrics mock to prevent KeyError
* fix test_engine_client.py: comprehensive test improvements
* feat: enhance test_engine_client.py with comprehensive test improvements
* fix: resolve test failures in test_engine_client.py
* test: enhance EngineClient test coverage with comprehensive test suite
* test: add comprehensive EngineClient test suite (codestyle checked)
2026-03-02 14:29:23 +08:00
yzwu
6674131b0b
[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding ( #6553 )
2026-03-02 14:07:17 +08:00
YuBaoku
481d0e385f
[CI] Skip long-sequence case due to potential non-determinism ( #6587 )
2026-03-02 11:34:15 +08:00
周周周
d957ccd46d
seq_lens related tensor shape -> [max_num_seqs] ( #6535 )
2026-03-02 11:18:30 +08:00
AIbin
59b578c337
[Feature]Supports SWA based on appendattn ( #6547 )
2026-03-01 19:02:08 +08:00
Yonghua Li
7cf5e64c7a
[BugFix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend ( #6516 )
...
* [fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend
* [fix] fix test_cache_transfer_manager
* [fix] fix test_cache_transfer_manager again
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-03-01 13:43:31 +08:00
YuBaoku
bb51829bd5
[CI] Fix tests and docs to resolve failure ( #6572 )
2026-03-01 12:33:01 +08:00
0Ayachi0
977e2cc202
[CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 ( #6209 )
...
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充
* [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充
* [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充
* Merge branch 'develop' into 23
* Merge branch 'develop' into 23
* Merge branch 'develop' into 23
* Merge branch 'develop' into 23
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-02-28 19:29:02 +08:00
zccjjj
a2072fe20c
[XPU] support warmup with ep & remove apply_tp_fused_op ( #6289 )
2026-02-28 15:40:36 +08:00
ming1753
97eee75677
[Feature] GPU Memory Optimization and Retirement of V0 Scheduler ( #6407 )
...
* Optim GPU Mem Usage
---------
Co-authored-by: huzesen <huzesen@baidu.com >
2026-02-28 15:07:43 +08:00
YuBaoku
54f7d9f621
[CI] Sync mm_batch_invariant with paddle.mm update ( #6557 )
2026-02-28 14:56:42 +08:00
YuBaoku
8e67fb422c
[CI] disable test_batch_invariance_op_mm.py in unit_test ( #6548 )
2026-02-28 10:16:14 +08:00
xunyoyo
12f754ef38
[CI] 【Hackathon 10th Spring No.42】功能模块 fastdeploy/entrypoints/openai/serving_chat.py单测补充 ( #6112 )
...
* test: expand OpenAI serving chat coverage
* Import RequestOutput in test_serving_chat.py
* Reorder import statements in test_serving_chat.py
* test: fix tool_calls finish_reason case
* test: refine serving_chat coverage
* test: format serving_chat tests
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-27 16:32:46 +08:00
ZeLong Li
81ea3674b0
[CI] 【Hackathon 10th Spring No.26】功能模块 fastdeploy/utils.py 单测补充 ( #6146 )
...
test (#6146 )
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-27 16:28:40 +08:00
xunyoyo
ff61a7f5a1
[CI] 【Hackathon 10th Spring No.40】功能模块 fastdeploy/model_executor/layers/linear.py单测补充 ( #6107 )
...
* Add linear layer tests for model executor
* Refine linear layer tests for uncovered branches
* Refactor and enhance tests for linear layers
Refactor test_linear.py by removing unused imports and redundant code, and updating model configuration parameters. Add new tests for linear layers and their loading mechanisms.
* test: patch row-parallel alltoall in unit test
* test: avoid alltoall reshape failure in row-parallel
* test: expand linear coverage targets
* Refine linear tests per review feedback
* Fix linear tests for pre-sharded config and qkv fixture
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-27 16:25:23 +08:00
sunxin
53aaac69da
[Optimization] Enable BF16 gate computation for GLM and Qwen ( #6457 )
...
* gate bf16
* add gate-fp32
* fix
* update baseline
* update
* update
* fix
2026-02-26 21:08:46 -08:00
gongweibao
edd31e8849
[Feature] Add Deterministic Inference Support ( #6476 )
...
* add
* [tests] Add Paddle attention determinism tests and refactor resource manager
Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
* add
* add
* add
* add
* add more
* add more
* fixsome
* fixsome
* fix bugs
* fix bugs
* only in gpu
* add docs
* fix comments
* fix some
* fix some
* fix comments
* add more
* fix potential problem
* remove not need
* remove not need
* remove no need
* fix bug
* fix bugs
* fix comments
* fix comments
* Update tests/ce/deterministic/test_determinism_verification.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/inter_communicator/test_ipc_signal.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/engine/test_sampling_params_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism_standalone.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix comments
* fix import error
* fix a bug
* fix bugs
* fix bugs
* fix coverage
* refine codes
* refine code
* fix comments
* fix comments
* fix comments
* rm not need
* fix allreduce large tensor bug
* mv log files
* mv log files
* add files
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-02-26 19:31:51 -08:00
zccjjj
c34cb2a8c2
[XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape ( #6337 )
2026-02-27 09:55:41 +08:00
kesmeey
bf14ea18aa
tests: fix cache_transfer_manager threading and init mocks ( #6502 )
...
tests: fix cache_transfer_manager threading and init mocks
2026-02-26 17:32:51 +08:00
yinwei
256651e9de
Add PD Cudagraph CI Case
2026-02-26 17:01:20 +08:00
GoldPancake
2178f2829b
[Speculative Decoding] Support suffix decoding ( #6403 )
...
* support suffix decoding
2026-02-26 11:42:05 +08:00
Yuanle Liu
6d3fede240
[OP][Feature] 统一 limit_thinking_content_length CUDA 算子,支持回复长度限制与注入序列 ( #6493 )
...
* Initial plan
* Migrate PRs #6311 , #6129 , #6305 to develop and merge unit tests
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* fix
* update
* fix
* fix ci
* fix ci
* Initial plan
* test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* test: add disable-thinking case to test_chat_with_response_max_tokens
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* test: add both reasoning_max_tokens and response_max_tokens case
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* fix ci
* fix ci
* fix ci
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
2026-02-25 21:36:50 +08:00
YuBaoku
fa8a2e32c8
[CI] Add test for prefix caching L2 swap ( #6507 )
2026-02-25 19:56:01 +08:00
jackyYang6
a29ee57e15
[Feature] Support ThinkingBudget Logits processor to control thinking content length ( #6367 )
...
* feat: add thinking budget logits processor
* add unittest
* fix pre-commit
* add unittest
* docs: clarify operator-level vs logits processor usage and conflict guidance
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-25 14:17:09 +08:00
Longzhi Wang
22566168c3
[Feature] support qkv&gate linear fusion ( #6455 )
...
* [Feature] support qkv&gate linear fusion
* add test
2026-02-24 15:20:29 +08:00
jackyYang6
38c3e02470
fix paddleformers fallback ( #6465 )
2026-02-23 15:29:13 +08:00
Yonghua Li
e2332a1112
[BugFix] fix num_cpu_blocks computation ( #6438 )
...
* [BugFix] fix num_cpu_blocks computation
* [fix] fix syntax and log
* [fix] pre-commit
* [fix] use getattr
* [fix] ci test
2026-02-13 11:05:14 +08:00
YuBaoku
9d72332aca
[CI] Optimize unittest and fix title format ( #6464 )
...
* [CI] Optimize unit test duration and fix PR title format
2026-02-11 20:48:56 +08:00