Commit Graph

879 Commits

Author SHA1 Message Date
CSWYF3634076 97a4b3631e [Processor]add qwen3vl prompt_token_ids support (#6764)
* [Processor]add qwen3vl prompt_token_ids support

* [Processor]add qwen3vl prompt_token_ids support unittest

* [Processor]add qwen3vl prompt_token_ids support precommit
2026-03-11 15:08:56 +08:00
bukejiyu cffa8c246c [Others]update paddleformer 1.0.0 (#6496)
* update paddleformer 1.0.0

* update
2026-03-11 15:06:29 +08:00
Yonghua Li 7811eeccaa [fix] resolve get_save_output_v1 socket name conflicts between multiple instances (#6758) 2026-03-11 15:02:32 +08:00
freeliuzc cf7934a4b2 [Speculative Decoding] Unify Spec and non-spec branch (#6685)
* optimize spec-inference architecture

* delete debug log

* optimize spec_method usage  && fix unit_test

* add claude unit-test skill

* fix some ugly bug

* enhance robustness and bounds check

* unify method & spec_method to method to avoid bug

* activate CI

* fix unit test

* Unify logprobs computation for naive and speculative decoding, fix CUDA kernel

* fix logprob bug && optimize verify kernel

* fix exist_decode() judge
2026-03-10 23:58:44 -07:00
ddchenhao66 a502dda1fe [BugFix] fix multi-step mtp bug (#6754) 2026-03-11 10:16:04 +08:00
Jiang-Jia-Jun b05a6c4206 [BugFix][KVCache] Add inter-process lock to fix NaN error under DP+EP (#6724)
* [BugFix] Support  to fix NaN bug in EP

* Optimze notion for all the funs

* Fix potential lock contention failure issues

* Update fastdeploy/inter_communicator/ipc_signal.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update envs.py

* Update default value for USE_KVCACHE_LOCK

Change default value of USE_KVCACHE_LOCK from 1 to 0.

* Update worker_process.py

* Fix suffix wrong

* Update test_prefix_cache_manager.py

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-10 21:55:32 +08:00
Yonghua Li 6520ae807c [BugFix] fix grpc failure when tracing init before workers forked (#6732)
* [fix] fix grpc failure when tracing init before workers forked

* [fix] change default exporter to http

* [fix] fix test_trace
2026-03-10 21:24:10 +08:00
yzwu 67388ce2f3 [Iluvatar][CI] Replace ci in ernie-300B-4layer with ernie-21b. (#6747) 2026-03-10 17:25:52 +08:00
YuBaoku 596519831c [CI] Temporarily disable test_determinism_offline.py 2026-03-10 16:54:30 +08:00
YuBaoku 73de8b9795 [CI] Update test_determinism_long.py to reduce execution time 2026-03-10 11:34:36 +08:00
周周周 3897a0b4fc nvfp4 clean code (#6671) 2026-03-09 18:00:34 +08:00
0Ayachi0 0c69cdf56e [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 (#6208)
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充

* update test_fused_moe_triton_backend.py

* fix: apply code style formatting

* Merge branch 'develop' into develop

* Merge branch 'develop' into develop

* Merge branch 'develop' into develop

* Merge branch 'develop' into develop

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-03-09 14:24:08 +08:00
gongweibao 30f9f33f34 [Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kernel-level Fixes and Batch-invariant BMM (#6610)
* add fa deter

* add ut

* add long sentence

* fix basic

* fix bugs

* fix adn

* fix first

* fix single

* fix single

* fix single test

* refine

* add more test

* refine comments

* add comments of bmm

* fix ci

* remove probe

* add

* remove not need

* refine tests

* fix comments and refine code

* refine code

* refine test

* refine test

* mv 4cards tests

* fix tests

* add

* fix comments

* fix cover

* fix cover

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
2026-03-09 10:27:53 +08:00
ddchenhao66 3c0ff20328 [BugFix] fix incorrect function parameters of start_data_parallel_service (#6674) 2026-03-09 10:15:50 +08:00
YuBaoku cbfdf42628 [CI] Add test_dynamic_c8_cache.py and latest FastDeploy.tar.gz upload (#6708) 2026-03-08 16:01:12 +08:00
gongweibao 1e49855b0f [BugFix][DataProcessor] Add validate_model_path to fail fast on bad model path or unreachable network (#6713)
* fix

* add more endpoint

* fix some

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
2026-03-08 12:36:32 +08:00
luukunn aac1484b0d [Feature]add arguments string in tool (#6704)
* add arguments string
2026-03-06 20:45:09 +08:00
luukunn caf73e8131 [Feature]add reasoning effort (#6656)
* add reasoning_effort

* fix log

* fix reasoning_effort

* add reasoning_effort level

* fix valid_parameters

* fix valid_parameters

* fix

* fix unit test

* add unit test

* add unit test
2026-03-06 14:16:02 +08:00
yzwu 81acdb62bd [Iluvatar][CI] Do not specify FD_LOG_DIR (#6665) 2026-03-06 11:54:44 +08:00
YuBaoku 16a393e90e [CI] Fix non-deterministic test and skip failed_tests.log in log print (#6672) 2026-03-05 18:47:18 +08:00
sunxin 0dc7034ce0 [Model Runner] Deprecate not_need_stop (#6356)
* Deprecate not_need_stop
2026-03-05 10:55:42 +08:00
ddchenhao66 fa4815b93a [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server (#6598)
* [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server

* [BugFix] modify request_queue and result_queue of dp scheduler
2026-03-05 10:04:12 +08:00
YuBaoku 56ceeda80c [CI] Adjust model-specific diff threshold and include iluvatar XPU paths in coverage (#6663) 2026-03-05 10:02:54 +08:00
ming1753 02d32eea3b Revert "[Bug Fix] Fix MM mtp incorrect rope emb (#6581)" (#6631)
This reverts commit c5eb6b65e7.
2026-03-04 11:23:28 +08:00
kesmeey 3d3221e24e [CI] 【Hackathon 10th Spring No.31】功能模块 fastdeploy/model_executor/layers/sample/sampler.py单测补充 (#6200)
* Format code with black

* Format sampler tests

* update

* update
2026-03-04 10:57:37 +08:00
YuBaoku c3d6d706d5 [CI] Add nightly workflow for golang_router tests and improve log handling (#6608)
* [CI] Add nightly workflow for Golang router tests
* [CI] Improve pytest script stability and log handling
2026-03-03 19:36:57 +08:00
ming1753 c5eb6b65e7 [Bug Fix] Fix MM mtp incorrect rope emb (#6581)
* [Bug Fix] Fix MM mtp incorrect rope emb
2026-03-03 19:28:59 +08:00
qwes5s5 375b5b7b21 [Feature]Log Format Normalization and Trace Log Optimization (#6370)
* log refactor

* log refactor 2

* log refactor 3
2026-03-03 11:31:45 +08:00
huicongyao 0f718baaf2 [Speculative Decoding]Reformat input preprocess for spec decode (#6501)
* add speculate_pre_process kernel

* reduce one slice

* make d2h async && fix mtp bug for new pre_process

* fix

* add unitest

* fix: code stype formatting

* fix

* fix: thread race in speculate_preprocess && rename d2h event
2026-03-03 10:22:07 +08:00
kesmeey aae87e6ae2 [CI] 【Hackathon 10th Spring No.27】功能模块 fastdeploy/cache_manager/prefix_cache_manager.py单测补充 (#6297)
* test: update prefix cache manager tests

* test: refine prefix cache manager coverage helpers

* style: apply black formatting to test_prefix_cache_manager.py

Co-authored-by: Cursor <cursoragent@cursor.com>

* tests: update test_prefix_cache_manager

Co-authored-by: Cursor <cursoragent@cursor.com>

* update

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-03-02 20:04:12 +08:00
kesmeey 758770bc43 [CI] 【Hackathon 10th Spring No.28】功能模块 fastdeploy/entrypoints/engine_client.py 单测补充 (#6158)
* fix codestyle and update unit test coverage workflow

* fix test_engine_client.py: add main_process_metrics mock to prevent KeyError

* fix test_engine_client.py: comprehensive test improvements

* feat: enhance test_engine_client.py with comprehensive test improvements

* fix: resolve test failures in test_engine_client.py

* test: enhance EngineClient test coverage with comprehensive test suite

* test: add comprehensive EngineClient test suite (codestyle checked)
2026-03-02 14:29:23 +08:00
yzwu 6674131b0b [Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553) 2026-03-02 14:07:17 +08:00
YuBaoku 481d0e385f [CI] Skip long-sequence case due to potential non-determinism (#6587) 2026-03-02 11:34:15 +08:00
周周周 d957ccd46d seq_lens related tensor shape -> [max_num_seqs] (#6535) 2026-03-02 11:18:30 +08:00
AIbin 59b578c337 [Feature]Supports SWA based on appendattn (#6547) 2026-03-01 19:02:08 +08:00
Yonghua Li 7cf5e64c7a [BugFix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend (#6516)
* [fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend

* [fix] fix test_cache_transfer_manager

* [fix] fix test_cache_transfer_manager again

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-03-01 13:43:31 +08:00
YuBaoku bb51829bd5 [CI] Fix tests and docs to resolve failure (#6572) 2026-03-01 12:33:01 +08:00
0Ayachi0 977e2cc202 [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 (#6209)
* [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充

* [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充

* [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充

* Merge branch 'develop' into 23

* Merge branch 'develop' into 23

* Merge branch 'develop' into 23

* Merge branch 'develop' into 23

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-02-28 19:29:02 +08:00
zccjjj a2072fe20c [XPU] support warmup with ep & remove apply_tp_fused_op (#6289) 2026-02-28 15:40:36 +08:00
ming1753 97eee75677 [Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407)
* Optim GPU Mem Usage

---------

Co-authored-by: huzesen <huzesen@baidu.com>
2026-02-28 15:07:43 +08:00
YuBaoku 54f7d9f621 [CI] Sync mm_batch_invariant with paddle.mm update (#6557) 2026-02-28 14:56:42 +08:00
YuBaoku 8e67fb422c [CI] disable test_batch_invariance_op_mm.py in unit_test (#6548) 2026-02-28 10:16:14 +08:00
xunyoyo 12f754ef38 [CI] 【Hackathon 10th Spring No.42】功能模块 fastdeploy/entrypoints/openai/serving_chat.py单测补充 (#6112)
* test: expand OpenAI serving chat coverage

* Import RequestOutput in test_serving_chat.py

* Reorder import statements in test_serving_chat.py

* test: fix tool_calls finish_reason case

* test: refine serving_chat coverage

* test: format serving_chat tests

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-27 16:32:46 +08:00
ZeLong Li 81ea3674b0 [CI] 【Hackathon 10th Spring No.26】功能模块 fastdeploy/utils.py 单测补充 (#6146)
test (#6146)
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-27 16:28:40 +08:00
xunyoyo ff61a7f5a1 [CI] 【Hackathon 10th Spring No.40】功能模块 fastdeploy/model_executor/layers/linear.py单测补充 (#6107)
* Add linear layer tests for model executor

* Refine linear layer tests for uncovered branches

* Refactor and enhance tests for linear layers

Refactor test_linear.py by removing unused imports and redundant code, and updating model configuration parameters. Add new tests for linear layers and their loading mechanisms.

* test: patch row-parallel alltoall in unit test

* test: avoid alltoall reshape failure in row-parallel

* test: expand linear coverage targets

* Refine linear tests per review feedback

* Fix linear tests for pre-sharded config and qkv fixture

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-02-27 16:25:23 +08:00
sunxin 53aaac69da [Optimization] Enable BF16 gate computation for GLM and Qwen (#6457)
* gate bf16

* add gate-fp32

* fix

* update baseline

* update

* update

* fix
2026-02-26 21:08:46 -08:00
gongweibao edd31e8849 [Feature] Add Deterministic Inference Support (#6476)
* add

* [tests] Add Paddle attention determinism tests and refactor resource manager

Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* add

* add

* add

* add

* add more

* add more

* fixsome

* fixsome

* fix bugs

* fix bugs

* only in gpu

* add docs

* fix comments

* fix some

* fix some

* fix comments

* add more

* fix potential problem

* remove not need

* remove not need

* remove no need

* fix bug

* fix bugs

* fix comments

* fix comments

* Update tests/ce/deterministic/test_determinism_verification.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/inter_communicator/test_ipc_signal.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/engine/test_sampling_params_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/layers/test_paddle_attention_determinism_standalone.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix comments

* fix import error

* fix a bug

* fix bugs

* fix bugs

* fix coverage

* refine codes

* refine code

* fix comments

* fix comments

* fix comments

* rm not need

* fix allreduce large tensor bug

* mv log files

* mv log files

* add files

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-26 19:31:51 -08:00
zccjjj c34cb2a8c2 [XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape (#6337) 2026-02-27 09:55:41 +08:00
kesmeey bf14ea18aa tests: fix cache_transfer_manager threading and init mocks (#6502)
tests: fix cache_transfer_manager threading and init mocks
2026-02-26 17:32:51 +08:00
yinwei 256651e9de Add PD Cudagraph CI Case 2026-02-26 17:01:20 +08:00