Commit Graph

4304 Commits

Author SHA1 Message Date
YuBaoku 98519ee2e9 [CI] Fix archive URL injection in tag image build (#5828) 2025-12-30 14:28:17 +08:00
lizexu123 44a13e4557 [Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757)
* support

* fix

* support w4afp8 v1_loader and v0_loader

* fix

* fix test

* fix test

* fix test

* fix moe.py

* add test_ernie_4_5_w4afp8

* add test

* delete tensor

* fix test

* fix

* add

* fix test
2025-12-30 14:11:52 +08:00
GoldPancake e78e22ebd5 [BugFix] Fix entropy bugs (#5818)
* fix entropy bugs

* fix ut

* fix
2025-12-29 20:44:29 -08:00
tianhaodongbd edb9647422 [RL] add lm_head_fp32 in RolloutModelConfig (#5825) 2025-12-29 20:22:30 -08:00
周周周 7ae13b2326 [PD Disaggregation]remove unsed para in RDMACommManager (#5814) 2025-12-30 11:38:30 +08:00
Yonghua Li a8d3e3ba12 [BugFix] fix shm opened but not closed in set_data_ipc (#5826) 2025-12-29 23:35:07 +08:00
CSWYF3634076 deb9698ac5 remove invalid elif branch (#5821) 2025-12-29 19:21:28 +08:00
CSWYF3634076 9286403570 [Models] Add Qwen3-VL Model Support (#5763)
* support v1 loader

* remove useless code

* remove useless

* [Model] support Qwen3VL images success

* [Model] support Qwen3VL rope_3d

* [Model] support Qwen3VL remove log

* [Model] support Qwen3VL RL

* [Model] support Qwen3VL tp

* [Model] support Qwen3VL video

* [Model] support Qwen3VL fix ernievl

* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds

* [Model] support Qwen3VL fix multi card

* [Model] support Qwen3VL file close

* [Model] support Qwen3VL fix ce

* [Model] support Qwen3VL fix unittest

* [Model] support Qwen3VL add unittest

---------

Co-authored-by: Ayakouji <yuhongh@qq.com>
2025-12-29 17:39:33 +08:00
周周周 a3f0696e35 [BugFix] fix compile error in sm89 (#5809) 2025-12-29 16:55:52 +08:00
Ryan eb782a0225 [BugFix] Fix return value inconsistency for ep_moe_expert_combine op (#5812) 2025-12-29 16:44:00 +08:00
essos ffb3ccff74 [CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 (#5047)
* add test

* update test

* 精简代码

* 去除 mock

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-29 13:44:56 +08:00
xunyoyo 7e39560a42 [CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new (#5726)
* Add cache messager coverage tests

* Add default_dtype parameter to test cache manager

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-29 13:42:27 +08:00
Longzhi Wang 11329ee35e [Model] support mode config for expert_dispatch (#5748) 2025-12-29 13:37:20 +08:00
essos 8ee055aafc [CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 (#5050)
* Add comprehensive unit tests for data type conversion functionality

* fix

* Fix unit test failures in test_local_scheduler.py

* update

* fix code

* update mock

* add ut

* rm file

* update test

* 删除已覆盖的测试用例

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-29 12:41:50 +08:00
ddchenhao66 56a9ecccb2 [XPU] xpu support ep4tp4 (#5773)
* [XPU] xpu support ep4tp4

* Add commands to check multiprocessing and fastdeploy processes

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-29 11:27:01 +08:00
chenjian 91a2b13676 [BugFix] Fix preemption out of real_bsz (#5805) 2025-12-29 09:52:36 +08:00
YuBaoku c3ccfa974c [CI] Fix path error and port conflict (#5803) 2025-12-27 12:50:58 +08:00
Nyakku Shigure da9ea88a3b [BugFix] Correct condition for reversed_window_indices in SiglipEncoder (#5795) 2025-12-26 19:16:07 +08:00
Ryan 09229d8953 change count_tokens_per_expert_func declaration: Tensor -> vector<Tensor> (#5794) 2025-12-26 19:02:28 +08:00
Daci 77add7d1cc set tracelogger stacklevel=2 (#5766) 2025-12-26 17:43:32 +08:00
kxz2002 cad2932990 [BugFix] Fix process_response_dict to support async in serving_completion (#5758)
* support process_response_dict async initial commit

* fixbug

* add unit test

* optimize
2025-12-26 17:40:58 +08:00
Ryan 724045c426 add some op infershape&dtype (#5762) 2025-12-26 16:17:39 +08:00
kevin 894f4e312b [FDConfig] disable chunked_mm_input in ernie5 (#5774)
* disable chunked_mm_input in ernie5

* update code

* update code

* update test case

* update testcase

* upate case
2025-12-26 15:31:27 +08:00
周周周 03363cab4c make flash_mask attention pybind (#5783) 2025-12-26 14:31:35 +08:00
YuBaoku 8808dd1fed [CI] Enable custom_device_check in CI rerun (#5786)
* [CI] Enable custom_device_check in CI rerun
2025-12-26 14:09:16 +08:00
yzwu 7b6cc11952 [Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735) 2025-12-26 14:01:27 +08:00
RichardWooSJTU 01c18f328f rename need_block_num_signal (#5623) 2025-12-26 11:02:29 +08:00
YuBaoku 4c22a5afb8 [CI] Disable GPU cleanup due to CI machine limitations (#5781) 2025-12-26 00:11:06 +08:00
Yonghua Li 0c01cccc32 [BugFix] fix double shutdown of comm group when rank0 clears weights slower than other ranks (#5715) 2025-12-25 21:48:53 +08:00
kevin 5538dda3c8 [Feature] pd support dy-c8 ipc (#5750)
* pd support dy-c8 ipc

* update code

* support v0

* update code
2025-12-25 21:22:34 +08:00
kevin 4fa76296d9 [BugFix] fix mm splitwise scheduler bug (#5604)
* fix mm splitwise scheduler bug

* fix test case bug

* update code

* update code
2025-12-25 04:08:11 -08:00
ophilia-lee d5f5dc4f6e [Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题 (#5771)
* benchmark工具支持受限解码场景指定response_format

* Update backend_request_func.py

output.success判断兼容思考内容超长截断时回复内容为空的情况

* Update benchmark_serving.py

更新benchmark_metrics

* 支持Completions接口

* 支持Completions接口

* 支持Completions接口

* [Benchmark]支持Completions接口

* [Benchmark]支持Completions接口

* [Benchmark]async_request_eb_openai_completions 调大aiohttp 默认读 buffer size至4M,解决streaming 返回块过大报Chunk too big问题

* [Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 19:36:11 +08:00
Copilot 1cbf448178 [Feature] Add startup version check mechanism for Paddle (#5769)
* Initial plan

* 实现版本检查机制:添加get_version_info函数并在启动时检查Paddle版本

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* 修复代码审查反馈:改进错误处理和日志记录

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Change comments and warning messages from Chinese to English

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Update fastdeploy/__init__.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-25 19:29:04 +08:00
freeliuzc 9018ccf74e [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)
* fix attn_mask_offset in mtp with multi-step and pd-split-mode

* fix xpu operater register

* update pmtp multi-step mtp strategy in d-split -mode

* add note

* fix xpu register
2025-12-25 01:54:59 -08:00
YuBaoku 7247dc5f3a [CI] Add retry and robust cleanup for removal (#5725)
* [CI] Add retry and robust cleanup for removal

* [CI] Ensure clean GPU memory by killing leftover processes
2025-12-25 17:08:27 +08:00
Juncai 412867fd99 [Feature] Support KV Cache Storage (#5571)
* Support Mooncake Store

* up

* up

* add op

* fix conflict

* fix error

* up for comments

* avoid thread lock

* up

* fix unittest

* fix unittest

* remove debug info

* consider tp_size > 1

* add default rdma_nics

* add utils

* up

* fix error

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 16:30:35 +08:00
memoryCoderC be3be4913a [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM (#5195)
* [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM

* [Optimization] refactor(chat_handler,completion_handler): rename class
2025-12-25 16:28:15 +08:00
Jiaxin Sui 8fc789bb3f [iluvatar][CI] refactor iluvatar_ci (#5588)
* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* refactor iluvatar_ci

* Update Docker image tag in iluvatar_test workflow

* Update default Docker image version in workflow

* Update iluvatar_test.yml

* Update default Docker image in workflow config

* Update model path in run_ernie300B_4layer.py

* Update model path in offline inference check

* Add model_data directory and copy model files

Create model_data directory and copy necessary files.

* Update run_ernie_vl_28B.py

* Update run_ernie300B_4layer.py

* Update paddlepaddle installation method in script

* Change wget command to include proxy option

* Modify paddle package installation in CI script

Updated installation commands for paddle packages.

* Update paddlepaddle and paddle-iluvatar-gpu versions

* Delete .github/workflows/ci_iluvatar.yml

* Rename workflow from ILUVATAR Test to ILUVATAR-CI

* Update installation commands for paddlepaddle and iluvatar
2025-12-25 15:10:34 +08:00
qw86972190 135e47d551 [XPU]ZMQ logprob (#5628)
* [XPU]ZMQ logprob
2025-12-25 14:50:01 +08:00
Yuanle Liu 75b3180280 [BugFix] Fix _disable_sequence_parallel_moe_if_needed (#5740) 2025-12-24 20:02:22 -08:00
MingkunZhang e48e306134 [Metax] update ci bash (#5760)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2025-12-25 11:47:38 +08:00
bukejiyu f0bbdce849 [Loader]Fix bug in MTP weight loading (#5744)
* fix torch mtp

* fix

* update
2025-12-25 11:32:17 +08:00
RuohengMa e154c03416 [XPU] refine moe_expert_ffn ut (#5743) 2025-12-25 10:35:24 +08:00
YuBaoku 9624bf3c6e [CI] Fix image build to use the correct upstream artifacts 2025-12-24 22:44:34 +08:00
chenjian b90a922f98 [Bug fix] Set enable_cache_output as false by default (#5751) 2025-12-24 21:37:24 +08:00
YuBaoku 6e39f88ca0 [CI] Fix ci_image_update error of no depends 2025-12-24 21:28:38 +08:00
YuBaoku 0410c42a9a [CI] Refactor RL tests to reuse stable_test (#5516)
* [CI] Refactor RL tests to reuse stable_test
2025-12-24 19:18:00 +08:00
freeliuzc 2dc2ba49b5 [Speculative Decoding] Fix multistep MTP in splitewise-prefill mode (#5723) 2025-12-24 02:45:54 -08:00
YuBaoku e75f93d302 [CI] Refactor RL tests to reuse test_metrics (#5741) 2025-12-24 17:08:40 +08:00
chen c7ab32d154 check (#5736) 2025-12-24 16:49:20 +08:00