CSWYF3634076
97a4b3631e
[Processor]add qwen3vl prompt_token_ids support ( #6764 )
...
* [Processor]add qwen3vl prompt_token_ids support
* [Processor]add qwen3vl prompt_token_ids support unittest
* [Processor]add qwen3vl prompt_token_ids support precommit
2026-03-11 15:08:56 +08:00
bukejiyu
cffa8c246c
[Others]update paddleformer 1.0.0 ( #6496 )
...
* update paddleformer 1.0.0
* update
2026-03-11 15:06:29 +08:00
Yonghua Li
7811eeccaa
[fix] resolve get_save_output_v1 socket name conflicts between multiple instances ( #6758 )
2026-03-11 15:02:32 +08:00
freeliuzc
cf7934a4b2
[Speculative Decoding] Unify Spec and non-spec branch ( #6685 )
...
* optimize spec-inference architecture
* delete debug log
* optimize spec_method usage && fix unit_test
* add claude unit-test skill
* fix some ugly bug
* enhance robustness and bounds check
* unify method & spec_method to method to avoid bug
* activate CI
* fix unit test
* Unify logprobs computation for naive and speculative decoding, fix CUDA kernel
* fix logprob bug && optimize verify kernel
* fix exist_decode() judge
2026-03-10 23:58:44 -07:00
ddchenhao66
a502dda1fe
[BugFix] fix multi-step mtp bug ( #6754 )
2026-03-11 10:16:04 +08:00
Jiang-Jia-Jun
b05a6c4206
[BugFix][KVCache] Add inter-process lock to fix NaN error under DP+EP ( #6724 )
...
* [BugFix] Support to fix NaN bug in EP
* Optimze notion for all the funs
* Fix potential lock contention failure issues
* Update fastdeploy/inter_communicator/ipc_signal.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update envs.py
* Update default value for USE_KVCACHE_LOCK
Change default value of USE_KVCACHE_LOCK from 1 to 0.
* Update worker_process.py
* Fix suffix wrong
* Update test_prefix_cache_manager.py
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-03-10 21:55:32 +08:00
Yonghua Li
6520ae807c
[BugFix] fix grpc failure when tracing init before workers forked ( #6732 )
...
* [fix] fix grpc failure when tracing init before workers forked
* [fix] change default exporter to http
* [fix] fix test_trace
2026-03-10 21:24:10 +08:00
yzwu
67388ce2f3
[Iluvatar][CI] Replace ci in ernie-300B-4layer with ernie-21b. ( #6747 )
2026-03-10 17:25:52 +08:00
YuBaoku
596519831c
[CI] Temporarily disable test_determinism_offline.py
2026-03-10 16:54:30 +08:00
YuBaoku
73de8b9795
[CI] Update test_determinism_long.py to reduce execution time
2026-03-10 11:34:36 +08:00
周周周
3897a0b4fc
nvfp4 clean code ( #6671 )
2026-03-09 18:00:34 +08:00
0Ayachi0
0c69cdf56e
[CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 ( #6208 )
...
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充
* update test_fused_moe_triton_backend.py
* fix: apply code style formatting
* Merge branch 'develop' into develop
* Merge branch 'develop' into develop
* Merge branch 'develop' into develop
* Merge branch 'develop' into develop
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-03-09 14:24:08 +08:00
gongweibao
30f9f33f34
[Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kernel-level Fixes and Batch-invariant BMM ( #6610 )
...
* add fa deter
* add ut
* add long sentence
* fix basic
* fix bugs
* fix adn
* fix first
* fix single
* fix single
* fix single test
* refine
* add more test
* refine comments
* add comments of bmm
* fix ci
* remove probe
* add
* remove not need
* refine tests
* fix comments and refine code
* refine code
* refine test
* refine test
* mv 4cards tests
* fix tests
* add
* fix comments
* fix cover
* fix cover
---------
Co-authored-by: gongweibao <gognweibao@baidu.com >
2026-03-09 10:27:53 +08:00
ddchenhao66
3c0ff20328
[BugFix] fix incorrect function parameters of start_data_parallel_service ( #6674 )
2026-03-09 10:15:50 +08:00
YuBaoku
cbfdf42628
[CI] Add test_dynamic_c8_cache.py and latest FastDeploy.tar.gz upload ( #6708 )
2026-03-08 16:01:12 +08:00
gongweibao
1e49855b0f
[BugFix][DataProcessor] Add validate_model_path to fail fast on bad model path or unreachable network ( #6713 )
...
* fix
* add more endpoint
* fix some
---------
Co-authored-by: gongweibao <gognweibao@baidu.com >
2026-03-08 12:36:32 +08:00
luukunn
aac1484b0d
[Feature]add arguments string in tool ( #6704 )
...
* add arguments string
2026-03-06 20:45:09 +08:00
luukunn
caf73e8131
[Feature]add reasoning effort ( #6656 )
...
* add reasoning_effort
* fix log
* fix reasoning_effort
* add reasoning_effort level
* fix valid_parameters
* fix valid_parameters
* fix
* fix unit test
* add unit test
* add unit test
2026-03-06 14:16:02 +08:00
yzwu
81acdb62bd
[Iluvatar][CI] Do not specify FD_LOG_DIR ( #6665 )
2026-03-06 11:54:44 +08:00
YuBaoku
16a393e90e
[CI] Fix non-deterministic test and skip failed_tests.log in log print ( #6672 )
2026-03-05 18:47:18 +08:00
sunxin
0dc7034ce0
[Model Runner] Deprecate not_need_stop ( #6356 )
...
* Deprecate not_need_stop
2026-03-05 10:55:42 +08:00
ddchenhao66
fa4815b93a
[BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server ( #6598 )
...
* [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server
* [BugFix] modify request_queue and result_queue of dp scheduler
2026-03-05 10:04:12 +08:00
YuBaoku
56ceeda80c
[CI] Adjust model-specific diff threshold and include iluvatar XPU paths in coverage ( #6663 )
2026-03-05 10:02:54 +08:00
ming1753
02d32eea3b
Revert "[Bug Fix] Fix MM mtp incorrect rope emb ( #6581 )" ( #6631 )
...
This reverts commit c5eb6b65e7 .
2026-03-04 11:23:28 +08:00
kesmeey
3d3221e24e
[CI] 【Hackathon 10th Spring No.31】功能模块 fastdeploy/model_executor/layers/sample/sampler.py单测补充 ( #6200 )
...
* Format code with black
* Format sampler tests
* update
* update
2026-03-04 10:57:37 +08:00
YuBaoku
c3d6d706d5
[CI] Add nightly workflow for golang_router tests and improve log handling ( #6608 )
...
* [CI] Add nightly workflow for Golang router tests
* [CI] Improve pytest script stability and log handling
2026-03-03 19:36:57 +08:00
ming1753
c5eb6b65e7
[Bug Fix] Fix MM mtp incorrect rope emb ( #6581 )
...
* [Bug Fix] Fix MM mtp incorrect rope emb
2026-03-03 19:28:59 +08:00
qwes5s5
375b5b7b21
[Feature]Log Format Normalization and Trace Log Optimization ( #6370 )
...
* log refactor
* log refactor 2
* log refactor 3
2026-03-03 11:31:45 +08:00
huicongyao
0f718baaf2
[Speculative Decoding]Reformat input preprocess for spec decode ( #6501 )
...
* add speculate_pre_process kernel
* reduce one slice
* make d2h async && fix mtp bug for new pre_process
* fix
* add unitest
* fix: code stype formatting
* fix
* fix: thread race in speculate_preprocess && rename d2h event
2026-03-03 10:22:07 +08:00
kesmeey
aae87e6ae2
[CI] 【Hackathon 10th Spring No.27】功能模块 fastdeploy/cache_manager/prefix_cache_manager.py单测补充 ( #6297 )
...
* test: update prefix cache manager tests
* test: refine prefix cache manager coverage helpers
* style: apply black formatting to test_prefix_cache_manager.py
Co-authored-by: Cursor <cursoragent@cursor.com >
* tests: update test_prefix_cache_manager
Co-authored-by: Cursor <cursoragent@cursor.com >
* update
---------
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-03-02 20:04:12 +08:00
kesmeey
758770bc43
[CI] 【Hackathon 10th Spring No.28】功能模块 fastdeploy/entrypoints/engine_client.py 单测补充 ( #6158 )
...
* fix codestyle and update unit test coverage workflow
* fix test_engine_client.py: add main_process_metrics mock to prevent KeyError
* fix test_engine_client.py: comprehensive test improvements
* feat: enhance test_engine_client.py with comprehensive test improvements
* fix: resolve test failures in test_engine_client.py
* test: enhance EngineClient test coverage with comprehensive test suite
* test: add comprehensive EngineClient test suite (codestyle checked)
2026-03-02 14:29:23 +08:00
yzwu
6674131b0b
[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding ( #6553 )
2026-03-02 14:07:17 +08:00
YuBaoku
481d0e385f
[CI] Skip long-sequence case due to potential non-determinism ( #6587 )
2026-03-02 11:34:15 +08:00
周周周
d957ccd46d
seq_lens related tensor shape -> [max_num_seqs] ( #6535 )
2026-03-02 11:18:30 +08:00
AIbin
59b578c337
[Feature]Supports SWA based on appendattn ( #6547 )
2026-03-01 19:02:08 +08:00
Yonghua Li
7cf5e64c7a
[BugFix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend ( #6516 )
...
* [fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend
* [fix] fix test_cache_transfer_manager
* [fix] fix test_cache_transfer_manager again
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-03-01 13:43:31 +08:00
YuBaoku
bb51829bd5
[CI] Fix tests and docs to resolve failure ( #6572 )
2026-03-01 12:33:01 +08:00
0Ayachi0
977e2cc202
[CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 ( #6209 )
...
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充
* [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充
* [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充
* Merge branch 'develop' into 23
* Merge branch 'develop' into 23
* Merge branch 'develop' into 23
* Merge branch 'develop' into 23
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-02-28 19:29:02 +08:00
zccjjj
a2072fe20c
[XPU] support warmup with ep & remove apply_tp_fused_op ( #6289 )
2026-02-28 15:40:36 +08:00
ming1753
97eee75677
[Feature] GPU Memory Optimization and Retirement of V0 Scheduler ( #6407 )
...
* Optim GPU Mem Usage
---------
Co-authored-by: huzesen <huzesen@baidu.com >
2026-02-28 15:07:43 +08:00
YuBaoku
54f7d9f621
[CI] Sync mm_batch_invariant with paddle.mm update ( #6557 )
2026-02-28 14:56:42 +08:00
YuBaoku
8e67fb422c
[CI] disable test_batch_invariance_op_mm.py in unit_test ( #6548 )
2026-02-28 10:16:14 +08:00
xunyoyo
12f754ef38
[CI] 【Hackathon 10th Spring No.42】功能模块 fastdeploy/entrypoints/openai/serving_chat.py单测补充 ( #6112 )
...
* test: expand OpenAI serving chat coverage
* Import RequestOutput in test_serving_chat.py
* Reorder import statements in test_serving_chat.py
* test: fix tool_calls finish_reason case
* test: refine serving_chat coverage
* test: format serving_chat tests
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-27 16:32:46 +08:00
ZeLong Li
81ea3674b0
[CI] 【Hackathon 10th Spring No.26】功能模块 fastdeploy/utils.py 单测补充 ( #6146 )
...
test (#6146 )
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-27 16:28:40 +08:00
xunyoyo
ff61a7f5a1
[CI] 【Hackathon 10th Spring No.40】功能模块 fastdeploy/model_executor/layers/linear.py单测补充 ( #6107 )
...
* Add linear layer tests for model executor
* Refine linear layer tests for uncovered branches
* Refactor and enhance tests for linear layers
Refactor test_linear.py by removing unused imports and redundant code, and updating model configuration parameters. Add new tests for linear layers and their loading mechanisms.
* test: patch row-parallel alltoall in unit test
* test: avoid alltoall reshape failure in row-parallel
* test: expand linear coverage targets
* Refine linear tests per review feedback
* Fix linear tests for pre-sharded config and qkv fixture
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-02-27 16:25:23 +08:00
sunxin
53aaac69da
[Optimization] Enable BF16 gate computation for GLM and Qwen ( #6457 )
...
* gate bf16
* add gate-fp32
* fix
* update baseline
* update
* update
* fix
2026-02-26 21:08:46 -08:00
gongweibao
edd31e8849
[Feature] Add Deterministic Inference Support ( #6476 )
...
* add
* [tests] Add Paddle attention determinism tests and refactor resource manager
Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
* add
* add
* add
* add
* add more
* add more
* fixsome
* fixsome
* fix bugs
* fix bugs
* only in gpu
* add docs
* fix comments
* fix some
* fix some
* fix comments
* add more
* fix potential problem
* remove not need
* remove not need
* remove no need
* fix bug
* fix bugs
* fix comments
* fix comments
* Update tests/ce/deterministic/test_determinism_verification.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/inter_communicator/test_ipc_signal.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/engine/test_sampling_params_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism_standalone.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix comments
* fix import error
* fix a bug
* fix bugs
* fix bugs
* fix coverage
* refine codes
* refine code
* fix comments
* fix comments
* fix comments
* rm not need
* fix allreduce large tensor bug
* mv log files
* mv log files
* add files
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-02-26 19:31:51 -08:00
zccjjj
c34cb2a8c2
[XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape ( #6337 )
2026-02-27 09:55:41 +08:00
kesmeey
bf14ea18aa
tests: fix cache_transfer_manager threading and init mocks ( #6502 )
...
tests: fix cache_transfer_manager threading and init mocks
2026-02-26 17:32:51 +08:00
yinwei
256651e9de
Add PD Cudagraph CI Case
2026-02-26 17:01:20 +08:00