YuBaoku
98519ee2e9
[CI] Fix archive URL injection in tag image build ( #5828 )
2025-12-30 14:28:17 +08:00
lizexu123
44a13e4557
[Feature] support w4afp8 v1_loader and v0_loader(tp>1) ( #5757 )
...
* support
* fix
* support w4afp8 v1_loader and v0_loader
* fix
* fix test
* fix test
* fix test
* fix moe.py
* add test_ernie_4_5_w4afp8
* add test
* delete tensor
* fix test
* fix
* add
* fix test
2025-12-30 14:11:52 +08:00
GoldPancake
e78e22ebd5
[BugFix] Fix entropy bugs ( #5818 )
...
* fix entropy bugs
* fix ut
* fix
2025-12-29 20:44:29 -08:00
tianhaodongbd
edb9647422
[RL] add lm_head_fp32 in RolloutModelConfig ( #5825 )
2025-12-29 20:22:30 -08:00
周周周
7ae13b2326
[PD Disaggregation]remove unsed para in RDMACommManager ( #5814 )
2025-12-30 11:38:30 +08:00
Yonghua Li
a8d3e3ba12
[BugFix] fix shm opened but not closed in set_data_ipc ( #5826 )
2025-12-29 23:35:07 +08:00
CSWYF3634076
deb9698ac5
remove invalid elif branch ( #5821 )
2025-12-29 19:21:28 +08:00
CSWYF3634076
9286403570
[Models] Add Qwen3-VL Model Support ( #5763 )
...
* support v1 loader
* remove useless code
* remove useless
* [Model] support Qwen3VL images success
* [Model] support Qwen3VL rope_3d
* [Model] support Qwen3VL remove log
* [Model] support Qwen3VL RL
* [Model] support Qwen3VL tp
* [Model] support Qwen3VL video
* [Model] support Qwen3VL fix ernievl
* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds
* [Model] support Qwen3VL fix multi card
* [Model] support Qwen3VL file close
* [Model] support Qwen3VL fix ce
* [Model] support Qwen3VL fix unittest
* [Model] support Qwen3VL add unittest
---------
Co-authored-by: Ayakouji <yuhongh@qq.com >
2025-12-29 17:39:33 +08:00
周周周
a3f0696e35
[BugFix] fix compile error in sm89 ( #5809 )
2025-12-29 16:55:52 +08:00
Ryan
eb782a0225
[BugFix] Fix return value inconsistency for ep_moe_expert_combine op ( #5812 )
2025-12-29 16:44:00 +08:00
essos
ffb3ccff74
[CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 ( #5047 )
...
* add test
* update test
* 精简代码
* 去除 mock
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-29 13:44:56 +08:00
xunyoyo
7e39560a42
[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new ( #5726 )
...
* Add cache messager coverage tests
* Add default_dtype parameter to test cache manager
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-29 13:42:27 +08:00
Longzhi Wang
11329ee35e
[Model] support mode config for expert_dispatch ( #5748 )
2025-12-29 13:37:20 +08:00
essos
8ee055aafc
[CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 ( #5050 )
...
* Add comprehensive unit tests for data type conversion functionality
* fix
* Fix unit test failures in test_local_scheduler.py
* update
* fix code
* update mock
* add ut
* rm file
* update test
* 删除已覆盖的测试用例
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-29 12:41:50 +08:00
ddchenhao66
56a9ecccb2
[XPU] xpu support ep4tp4 ( #5773 )
...
* [XPU] xpu support ep4tp4
* Add commands to check multiprocessing and fastdeploy processes
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-12-29 11:27:01 +08:00
chenjian
91a2b13676
[BugFix] Fix preemption out of real_bsz ( #5805 )
2025-12-29 09:52:36 +08:00
YuBaoku
c3ccfa974c
[CI] Fix path error and port conflict ( #5803 )
2025-12-27 12:50:58 +08:00
Nyakku Shigure
da9ea88a3b
[BugFix] Correct condition for reversed_window_indices in SiglipEncoder ( #5795 )
2025-12-26 19:16:07 +08:00
Ryan
09229d8953
change count_tokens_per_expert_func declaration: Tensor -> vector<Tensor> ( #5794 )
2025-12-26 19:02:28 +08:00
Daci
77add7d1cc
set tracelogger stacklevel=2 ( #5766 )
2025-12-26 17:43:32 +08:00
kxz2002
cad2932990
[BugFix] Fix process_response_dict to support async in serving_completion ( #5758 )
...
* support process_response_dict async initial commit
* fixbug
* add unit test
* optimize
2025-12-26 17:40:58 +08:00
Ryan
724045c426
add some op infershape&dtype ( #5762 )
2025-12-26 16:17:39 +08:00
kevin
894f4e312b
[FDConfig] disable chunked_mm_input in ernie5 ( #5774 )
...
* disable chunked_mm_input in ernie5
* update code
* update code
* update test case
* update testcase
* upate case
2025-12-26 15:31:27 +08:00
周周周
03363cab4c
make flash_mask attention pybind ( #5783 )
2025-12-26 14:31:35 +08:00
YuBaoku
8808dd1fed
[CI] Enable custom_device_check in CI rerun ( #5786 )
...
* [CI] Enable custom_device_check in CI rerun
2025-12-26 14:09:16 +08:00
yzwu
7b6cc11952
[Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE ( #5735 )
2025-12-26 14:01:27 +08:00
RichardWooSJTU
01c18f328f
rename need_block_num_signal ( #5623 )
2025-12-26 11:02:29 +08:00
YuBaoku
4c22a5afb8
[CI] Disable GPU cleanup due to CI machine limitations ( #5781 )
2025-12-26 00:11:06 +08:00
Yonghua Li
0c01cccc32
[BugFix] fix double shutdown of comm group when rank0 clears weights slower than other ranks ( #5715 )
2025-12-25 21:48:53 +08:00
kevin
5538dda3c8
[Feature] pd support dy-c8 ipc ( #5750 )
...
* pd support dy-c8 ipc
* update code
* support v0
* update code
2025-12-25 21:22:34 +08:00
kevin
4fa76296d9
[BugFix] fix mm splitwise scheduler bug ( #5604 )
...
* fix mm splitwise scheduler bug
* fix test case bug
* update code
* update code
2025-12-25 04:08:11 -08:00
ophilia-lee
d5f5dc4f6e
[Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题 ( #5771 )
...
* benchmark工具支持受限解码场景指定response_format
* Update backend_request_func.py
output.success判断兼容思考内容超长截断时回复内容为空的情况
* Update benchmark_serving.py
更新benchmark_metrics
* 支持Completions接口
* 支持Completions接口
* 支持Completions接口
* [Benchmark]支持Completions接口
* [Benchmark]支持Completions接口
* [Benchmark]async_request_eb_openai_completions 调大aiohttp 默认读 buffer size至4M,解决streaming 返回块过大报Chunk too big问题
* [Benchmark]调大aiohttp 默认读 buffer size至10M,解决streaming 返回块过大报Chunk too big问题
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-25 19:36:11 +08:00
Copilot
1cbf448178
[Feature] Add startup version check mechanism for Paddle ( #5769 )
...
* Initial plan
* 实现版本检查机制:添加get_version_info函数并在启动时检查Paddle版本
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* 修复代码审查反馈:改进错误处理和日志记录
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Change comments and warning messages from Chinese to English
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Update fastdeploy/__init__.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-12-25 19:29:04 +08:00
freeliuzc
9018ccf74e
[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes ( #5738 )
...
* fix attn_mask_offset in mtp with multi-step and pd-split-mode
* fix xpu operater register
* update pmtp multi-step mtp strategy in d-split -mode
* add note
* fix xpu register
2025-12-25 01:54:59 -08:00
YuBaoku
7247dc5f3a
[CI] Add retry and robust cleanup for removal ( #5725 )
...
* [CI] Add retry and robust cleanup for removal
* [CI] Ensure clean GPU memory by killing leftover processes
2025-12-25 17:08:27 +08:00
Juncai
412867fd99
[Feature] Support KV Cache Storage ( #5571 )
...
* Support Mooncake Store
* up
* up
* add op
* fix conflict
* fix error
* up for comments
* avoid thread lock
* up
* fix unittest
* fix unittest
* remove debug info
* consider tp_size > 1
* add default rdma_nics
* add utils
* up
* fix error
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-25 16:30:35 +08:00
memoryCoderC
be3be4913a
[Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM ( #5195 )
...
* [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM
* [Optimization] refactor(chat_handler,completion_handler): rename class
2025-12-25 16:28:15 +08:00
Jiaxin Sui
8fc789bb3f
[iluvatar][CI] refactor iluvatar_ci ( #5588 )
...
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* Update Docker image tag in iluvatar_test workflow
* Update default Docker image version in workflow
* Update iluvatar_test.yml
* Update default Docker image in workflow config
* Update model path in run_ernie300B_4layer.py
* Update model path in offline inference check
* Add model_data directory and copy model files
Create model_data directory and copy necessary files.
* Update run_ernie_vl_28B.py
* Update run_ernie300B_4layer.py
* Update paddlepaddle installation method in script
* Change wget command to include proxy option
* Modify paddle package installation in CI script
Updated installation commands for paddle packages.
* Update paddlepaddle and paddle-iluvatar-gpu versions
* Delete .github/workflows/ci_iluvatar.yml
* Rename workflow from ILUVATAR Test to ILUVATAR-CI
* Update installation commands for paddlepaddle and iluvatar
2025-12-25 15:10:34 +08:00
qw86972190
135e47d551
[XPU]ZMQ logprob ( #5628 )
...
* [XPU]ZMQ logprob
2025-12-25 14:50:01 +08:00
Yuanle Liu
75b3180280
[BugFix] Fix _disable_sequence_parallel_moe_if_needed ( #5740 )
2025-12-24 20:02:22 -08:00
MingkunZhang
e48e306134
[Metax] update ci bash ( #5760 )
...
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com >
2025-12-25 11:47:38 +08:00
bukejiyu
f0bbdce849
[Loader]Fix bug in MTP weight loading ( #5744 )
...
* fix torch mtp
* fix
* update
2025-12-25 11:32:17 +08:00
RuohengMa
e154c03416
[XPU] refine moe_expert_ffn ut ( #5743 )
2025-12-25 10:35:24 +08:00
YuBaoku
9624bf3c6e
[CI] Fix image build to use the correct upstream artifacts
2025-12-24 22:44:34 +08:00
chenjian
b90a922f98
[Bug fix] Set enable_cache_output as false by default ( #5751 )
2025-12-24 21:37:24 +08:00
YuBaoku
6e39f88ca0
[CI] Fix ci_image_update error of no depends
2025-12-24 21:28:38 +08:00
YuBaoku
0410c42a9a
[CI] Refactor RL tests to reuse stable_test ( #5516 )
...
* [CI] Refactor RL tests to reuse stable_test
2025-12-24 19:18:00 +08:00
freeliuzc
2dc2ba49b5
[Speculative Decoding] Fix multistep MTP in splitewise-prefill mode ( #5723 )
2025-12-24 02:45:54 -08:00
YuBaoku
e75f93d302
[CI] Refactor RL tests to reuse test_metrics ( #5741 )
2025-12-24 17:08:40 +08:00
chen
c7ab32d154
check ( #5736 )
2025-12-24 16:49:20 +08:00