Commit Graph

593 Commits

Author SHA1 Message Date
周周周 7a0744f05a [UT]support attention test tp (#5887) 2026-01-06 11:15:01 +08:00
Jiaxin Sui 2785b820c8 [XPU][CI] Add XPU logprobs case (#5874)
* Enhance run_ci_xpu.sh with caching and prefill options

* Update model path and configuration in run_ci_xpu.sh

* Add '北朝' keyword to assertion in run_45vl.py

* Enhance process termination logic in run_ci_xpu.sh

* Set timeout for CI_XPU job to 60 minutes

* Remove extra newline in stop_processes function

* Update paddlepaddle-xpu installation command

Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.

* Update PaddlePaddle installation command

* Remove max_tokens from model response configuration

Removed max_tokens parameter from the model response call.

* add xpu logprobs case

* Fix formatting and improve setup_logprobs_env

Add newline at end of file and update setup_logprobs_env function.

* Refactor test_logprobs_21b_tp4.py for clarity

* Change top_p value from 1.0 to 0

---------

Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
2026-01-05 19:01:14 +08:00
jc 8d384f9fd8 [PD Disaggregation] Update usage of pd disaggregation and data parallel (#5742)
* Update usage of pd disaggregation

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up dp docs

* up

* up

* up

* fix unittest
2026-01-05 17:51:29 +08:00
jc e911ac2ce7 [BugFix] Refine the preparation of cpu and storage cache (#5777)
* Refine the preparation of cpu and storage cache

* fix error

* fix error

* up

* fix

* up docs

* fix unittest

* remove debug info
2026-01-05 10:13:30 +08:00
kevin 52dc9a7b85 [BugFix] skip mm revert (#5848)
* skip mm revert

* update code

* update test
2026-01-04 14:25:45 +08:00
周周周 e3957a5ebc [Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620) 2026-01-04 11:21:15 +08:00
GoldPancake 4e10ae5d99 [Speculative Decoding] Optimize draft logprob (#5842)
* optimize draft logprob

* fix ut
2025-12-31 13:35:56 +08:00
xjkmfa ed60b4da32 [CI case]Prompt logprob (#5835)
* [ci case]prompt_logprobs
2025-12-30 21:26:06 +08:00
essos b03a4f3e3d [CI]【Hackathon 9th Sprint No.46】NO.46 功能模块 fastdeploy/model_executor/guided_decoding/xgrammar_backend.py 单测补充 (#5042)
* test

* rename ut

* remove test max_rollback_tokens

* update

* 精简代码

* fix: torch use mock

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-30 17:05:26 +08:00
chen 0bcf924e10 [Optimization] Optimization for gather_logprob by 10GB (#5817)
* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k
2025-12-30 15:33:34 +08:00
lizexu123 44a13e4557 [Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757)
* support

* fix

* support w4afp8 v1_loader and v0_loader

* fix

* fix test

* fix test

* fix test

* fix moe.py

* add test_ernie_4_5_w4afp8

* add test

* delete tensor

* fix test

* fix

* add

* fix test
2025-12-30 14:11:52 +08:00
GoldPancake e78e22ebd5 [BugFix] Fix entropy bugs (#5818)
* fix entropy bugs

* fix ut

* fix
2025-12-29 20:44:29 -08:00
周周周 7ae13b2326 [PD Disaggregation]remove unsed para in RDMACommManager (#5814) 2025-12-30 11:38:30 +08:00
Yonghua Li a8d3e3ba12 [BugFix] fix shm opened but not closed in set_data_ipc (#5826) 2025-12-29 23:35:07 +08:00
CSWYF3634076 9286403570 [Models] Add Qwen3-VL Model Support (#5763)
* support v1 loader

* remove useless code

* remove useless

* [Model] support Qwen3VL images success

* [Model] support Qwen3VL rope_3d

* [Model] support Qwen3VL remove log

* [Model] support Qwen3VL RL

* [Model] support Qwen3VL tp

* [Model] support Qwen3VL video

* [Model] support Qwen3VL fix ernievl

* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds

* [Model] support Qwen3VL fix multi card

* [Model] support Qwen3VL file close

* [Model] support Qwen3VL fix ce

* [Model] support Qwen3VL fix unittest

* [Model] support Qwen3VL add unittest

---------

Co-authored-by: Ayakouji <yuhongh@qq.com>
2025-12-29 17:39:33 +08:00
essos ffb3ccff74 [CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 (#5047)
* add test

* update test

* 精简代码

* 去除 mock

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-29 13:44:56 +08:00
xunyoyo 7e39560a42 [CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new (#5726)
* Add cache messager coverage tests

* Add default_dtype parameter to test cache manager

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-29 13:42:27 +08:00
essos 8ee055aafc [CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 (#5050)
* Add comprehensive unit tests for data type conversion functionality

* fix

* Fix unit test failures in test_local_scheduler.py

* update

* fix code

* update mock

* add ut

* rm file

* update test

* 删除已覆盖的测试用例

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-29 12:41:50 +08:00
ddchenhao66 56a9ecccb2 [XPU] xpu support ep4tp4 (#5773)
* [XPU] xpu support ep4tp4

* Add commands to check multiprocessing and fastdeploy processes

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-29 11:27:01 +08:00
YuBaoku c3ccfa974c [CI] Fix path error and port conflict (#5803) 2025-12-27 12:50:58 +08:00
kxz2002 cad2932990 [BugFix] Fix process_response_dict to support async in serving_completion (#5758)
* support process_response_dict async initial commit

* fixbug

* add unit test

* optimize
2025-12-26 17:40:58 +08:00
kevin 894f4e312b [FDConfig] disable chunked_mm_input in ernie5 (#5774)
* disable chunked_mm_input in ernie5

* update code

* update code

* update test case

* update testcase

* upate case
2025-12-26 15:31:27 +08:00
yzwu 7b6cc11952 [Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735) 2025-12-26 14:01:27 +08:00
YuBaoku 4c22a5afb8 [CI] Disable GPU cleanup due to CI machine limitations (#5781) 2025-12-26 00:11:06 +08:00
kevin 4fa76296d9 [BugFix] fix mm splitwise scheduler bug (#5604)
* fix mm splitwise scheduler bug

* fix test case bug

* update code

* update code
2025-12-25 04:08:11 -08:00
Copilot 1cbf448178 [Feature] Add startup version check mechanism for Paddle (#5769)
* Initial plan

* 实现版本检查机制:添加get_version_info函数并在启动时检查Paddle版本

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* 修复代码审查反馈:改进错误处理和日志记录

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Change comments and warning messages from Chinese to English

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Update fastdeploy/__init__.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-25 19:29:04 +08:00
freeliuzc 9018ccf74e [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)
* fix attn_mask_offset in mtp with multi-step and pd-split-mode

* fix xpu operater register

* update pmtp multi-step mtp strategy in d-split -mode

* add note

* fix xpu register
2025-12-25 01:54:59 -08:00
YuBaoku 7247dc5f3a [CI] Add retry and robust cleanup for removal (#5725)
* [CI] Add retry and robust cleanup for removal

* [CI] Ensure clean GPU memory by killing leftover processes
2025-12-25 17:08:27 +08:00
Juncai 412867fd99 [Feature] Support KV Cache Storage (#5571)
* Support Mooncake Store

* up

* up

* add op

* fix conflict

* fix error

* up for comments

* avoid thread lock

* up

* fix unittest

* fix unittest

* remove debug info

* consider tp_size > 1

* add default rdma_nics

* add utils

* up

* fix error

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 16:30:35 +08:00
memoryCoderC be3be4913a [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM (#5195)
* [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM

* [Optimization] refactor(chat_handler,completion_handler): rename class
2025-12-25 16:28:15 +08:00
Jiaxin Sui 8fc789bb3f [iluvatar][CI] refactor iluvatar_ci (#5588)
* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* Update Docker image tag in iluvatar_test workflow

* Update default Docker image version in workflow

* Update iluvatar_test.yml

* Update default Docker image in workflow config

* Update model path in run_ernie300B_4layer.py

* Update model path in offline inference check

* Add model_data directory and copy model files

Create model_data directory and copy necessary files.

* Update run_ernie_vl_28B.py

* Update run_ernie300B_4layer.py

* Update paddlepaddle installation method in script

* Change wget command to include proxy option

* Modify paddle package installation in CI script

Updated installation commands for paddle packages.

* Update paddlepaddle and paddle-iluvatar-gpu versions

* Delete .github/workflows/ci_iluvatar.yml

* Rename workflow from ILUVATAR Test to ILUVATAR-CI

* Update installation commands for paddlepaddle and iluvatar
2025-12-25 15:10:34 +08:00
YuBaoku 0410c42a9a [CI] Refactor RL tests to reuse stable_test (#5516)
* [CI] Refactor RL tests to reuse stable_test
2025-12-24 19:18:00 +08:00
YuBaoku e75f93d302 [CI] Refactor RL tests to reuse test_metrics (#5741) 2025-12-24 17:08:40 +08:00
Divano 6b0fba8294 Update run.sh 2025-12-24 15:35:17 +08:00
Nyakku Shigure 11227e00bb [GraphOptimization] Wrap deep gemm and triton as python op (#5673)
* [GraphOptimization] Wrap deep gemm and triton as python op

* add unitest to _base_test && compatibility

* paddle.static.MetaTensor -> "paddle.static.MetaTensor"

* mv register_custom_python_op

* rename yaml

---------

Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com>
2025-12-24 15:23:46 +08:00
bukejiyu ba4b7afb3a [Others] Rename tensor_parallel_degree to tensor_model_parallel_size for paddleformers 0.4.1 (#5727) 2025-12-23 23:19:11 -08:00
xunyoyo 8acdd9f156 [CI] 【Hackathon 9th Sprint No.41】NO.41 功能模块单测补充 -new
Add splitwise connector tests
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-24 14:05:32 +08:00
YuBaoku 672620cdfe Revert "[CI] Adapt vl_model baseline changes due to Paddle update (#5576)" (#5732)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
This reverts commit 63fff8df70.
2025-12-24 11:59:27 +08:00
GoldPancake 23d488c488 [Feature] Entropy calculation support (#5692)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* support entropy

* fix bug

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-23 21:19:47 +08:00
bukejiyu d1c6e57341 [Others] upgrade paddleformer to 0.4.0 (#5599) 2025-12-23 05:08:01 -08:00
ophilia-lee 99258e19c8 [Benchmark]支持Completions接口 (#5700)
* benchmark工具支持受限解码场景指定response_format

* Update backend_request_func.py

output.success判断兼容思考内容超长截断时回复内容为空的情况

* Update benchmark_serving.py

更新benchmark_metrics

* 支持Completions接口

* 支持Completions接口

* 支持Completions接口

* [Benchmark]支持Completions接口

* [Benchmark]支持Completions接口

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-23 19:46:23 +08:00
kesmeey f15edbb6ef [CI]【Hackathon 9th Sprint No.40】功能模块 fastdeploy/entrypoints/openai/api_server.py 单测补充 (#5567)
* Add tests for openai api_server coverage

* update

* Update tests for openai api_server

* fix bugs

* test: disable some api_server lifespan/controller tests for local env

* Format test_api_server with black

* update

* update

* test: narrow envs patch in api_server tests to avoid side effects

* fix: separate MagicMock creation to avoid missing req argument

* fix: patch TRACES_ENABLE env var in api_server tests

* fix: use os.environ patch for TRACES_ENABLE

* test: use fake fastdeploy.envs in api_server tests

* test: pass fake Request into chat/completion routes

* test: increase coverage for tracing and scheduler control

* fix: set dynamic_load_weight in tracing headers test

* ci: add retry and validation for FastDeploy.tar.gz download

* ci: fix indentation in _base_test.yml

* refactor: simplify test_api_server.py (807->480 lines, ~40% reduction)

* fix: restore missing args attributes (revision, etc.) in _build_args

* fix: patch sys.argv to prevent SystemExit: 2 in api_server tests

* improve coverage

* Remove docstring from test_api_server.py

Removed unnecessary docstring from test_api_server.py

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-23 18:06:43 +08:00
Divano c1aa66df02 Revert "[Optim] Remove limitation of number of kvcache blocks (#5612)" (#5702)
This reverts commit 9da89a374b.
2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun 9da89a374b [Optim] Remove limitation of number of kvcache blocks (#5612)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optim] Remove limitation of number of kvcache blocks

* Update fastdeploy/envs.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/worker/iluvatar_worker.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add docs

* Update fastdeploy/worker/worker_process.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix ci case

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-23 11:18:29 +08:00
xunyoyo 3aee5c4bf5 [CI] 【Hackathon 9th Sprint No.37】NO.37 功能模块单测补充 (#5059)
* Add unit tests for TokenProcessor functionality

* Add trace stubs for token processor tests

* Increase token processor test coverage

* Clean up imports in test_token_processor.py

Remove unnecessary path manipulation in test file.

* Cleanup: Remove unused imports in test_token_processor

Removed unused imports from the test file.

* Add trace_carrier to task in test cases

Added trace_carrier attribute to task in multiple test cases to ensure proper handling of trace information.

* Refine token processor tests for safe coverage

* Expand postprocess coverage

* Add ZMQ logprob parsing test

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: Tao Luo <luotao02@baidu.com>
2025-12-23 10:35:16 +08:00
Jiaxin Sui f16077a939 [XPU][CI] Xpu ci update (#5690)
* Enhance run_ci_xpu.sh with caching and prefill options

* Update model path and configuration in run_ci_xpu.sh

* Add '北朝' keyword to assertion in run_45vl.py

* Enhance process termination logic in run_ci_xpu.sh

* Set timeout for CI_XPU job to 60 minutes

* Remove extra newline in stop_processes function

* Update paddlepaddle-xpu installation command

Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.

* Update PaddlePaddle installation command

* Remove max_tokens from model response configuration

Removed max_tokens parameter from the model response call.
2025-12-23 10:19:39 +08:00
xiaolei373 dfe8ea941c [log]console log to llm log (#5680) 2025-12-23 10:05:45 +08:00
ddchenhao66 a1535c7e7e [XPU][CI] xpu add ci test for pd + TP2 (#5653)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-22 19:27:10 +08:00
lizexu123 6d323769dd fix w4afp8 (#5634) 2025-12-22 13:39:41 +08:00
YuBaoku fe55baae47 [CI] Fix unit_test error of unstable execution (#5660)
* [CI] Fix unit_test error of unstable execution
2025-12-19 22:59:53 +08:00