xiaoxiaohehe001
00a01ae024
[Feature] Support redundant expert for eplb ( #5918 )
...
* [BugFix] support redundant expert for eplb
* support redundant expert for eplb
* support redundant expert for eplb
* update
* fix ci eplb
2026-01-09 17:13:24 +08:00
CSWYF3634076
e6cdea4492
[Models] Qwen3VL and Qwen3VL-Moe CUDA graph Support ( #5962 )
...
* [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support
* [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v2
* [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v3
2026-01-09 17:09:02 +08:00
zccjjj
20de04e249
[XPU] move xpu_attn_backend.py to FastDeploy/fastdeploy/model_executor/layers/backends/xpu ( #5878 )
2026-01-09 16:34:57 +08:00
essos
1d20957340
[CI]【Hackathon 9th Sprint No.50】NO.50 功能模块 fastdeploy/entrypoints/engine_client.py 单测补充 -part #5045 ( #5807 )
...
* update test code
* 减少 mock
* fix style
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-01-09 15:13:19 +08:00
GoldPancake
3ca99ab170
[Speculative Decoding] Return accepted tokens per head in response ( #5947 )
...
* adjust log level
* add accepted tokens per head
* fix ut
* fix
2026-01-09 14:32:08 +08:00
kevin
2d2b156252
[BugFix] fix dyc8 cache bug ( #5958 )
...
* fix dyc8 cache bug
* update code
2026-01-08 19:25:47 -08:00
GoldPancake
e41d434548
[Bugfix] Fix entropy calculation bugs ( #5941 )
...
* fix entropy bugs
2026-01-08 20:57:35 +08:00
CSWYF3634076
d8fcb7c07d
[Models] Add Qwen3-VL Moe Model Support ( #5913 )
...
* [Model] add Qwen3vl moe model support
* [Model] add Qwen3vl moe model support remove log
* [Model] add Qwen3vl moe model support unittest
2026-01-08 11:36:42 +08:00
xunyoyo
78adf83549
[CI] 【Hackathon 9th Sprint No.18】NO.18 功能模块单测补充 -new ( #5717 )
...
* Remove paddle import guards from DeepEP tests
* Sort imports in DeepEP tests
* Refactor assertions for combine handle in test_ep.py
Updated assertions to verify combine handle in DeepEPEngine.
* Add moe_select coverage in DeepEP tests
* Refactor assertions for combine handle in test_ep
* Strengthen moe_select assertions in DeepEP tests
* Update test_ep.py
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-01-07 17:20:59 +08:00
kevin
eabd01cd21
[BugFix] fix eb5 prefix bug ( #5879 )
...
* fix eb5 prefix bug
* update ci test
* update code
* update code
* update code
* update code
* update code
* update code
* update code
2026-01-06 23:50:39 -08:00
ddchenhao66
733014bf32
[XPU] Support EP4TP1 in pd disaggregation ( #5860 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-06 15:25:36 +08:00
Yonghua Li
9445fbe054
[KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled ( #5871 )
...
* [fix] temporarily forbid cpu cache in update/clear api
* [fix] stop launching cache transfer manager unless hierarchical cache is enabled
* [fix] fix no attr hierarchical cache
* [fix] fix ci
* [fix] fix test_prefix_cache_manager.py
2026-01-06 14:27:47 +08:00
Yonghua Li
9fc2400e71
[BugFix] fix mtp cache attaching for pd disaggregation ( #5884 )
...
* [fix] fix mtp cache attaching for pd disaggregation
* [fix] fix test_mtp_proposer.py
2026-01-06 14:17:53 +08:00
lizexu123
acdf0cd1d9
fix hadamard_block_size ( #5888 )
2026-01-06 14:12:14 +08:00
周周周
7a0744f05a
[UT]support attention test tp ( #5887 )
2026-01-06 11:15:01 +08:00
Jiaxin Sui
2785b820c8
[XPU][CI] Add XPU logprobs case ( #5874 )
...
* Enhance run_ci_xpu.sh with caching and prefill options
* Update model path and configuration in run_ci_xpu.sh
* Add '北朝' keyword to assertion in run_45vl.py
* Enhance process termination logic in run_ci_xpu.sh
* Set timeout for CI_XPU job to 60 minutes
* Remove extra newline in stop_processes function
* Update paddlepaddle-xpu installation command
Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.
* Update PaddlePaddle installation command
* Remove max_tokens from model response configuration
Removed max_tokens parameter from the model response call.
* add xpu logprobs case
* Fix formatting and improve setup_logprobs_env
Add newline at end of file and update setup_logprobs_env function.
* Refactor test_logprobs_21b_tp4.py for clarity
* Change top_p value from 1.0 to 0
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com >
2026-01-05 19:01:14 +08:00
jc
8d384f9fd8
[PD Disaggregation] Update usage of pd disaggregation and data parallel ( #5742 )
...
* Update usage of pd disaggregation
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up dp docs
* up
* up
* up
* fix unittest
2026-01-05 17:51:29 +08:00
jc
e911ac2ce7
[BugFix] Refine the preparation of cpu and storage cache ( #5777 )
...
* Refine the preparation of cpu and storage cache
* fix error
* fix error
* up
* fix
* up docs
* fix unittest
* remove debug info
2026-01-05 10:13:30 +08:00
kevin
52dc9a7b85
[BugFix] skip mm revert ( #5848 )
...
* skip mm revert
* update code
* update test
2026-01-04 14:25:45 +08:00
周周周
e3957a5ebc
[Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel ( #5620 )
2026-01-04 11:21:15 +08:00
GoldPancake
4e10ae5d99
[Speculative Decoding] Optimize draft logprob ( #5842 )
...
* optimize draft logprob
* fix ut
2025-12-31 13:35:56 +08:00
xjkmfa
ed60b4da32
[CI case]Prompt logprob ( #5835 )
...
* [ci case]prompt_logprobs
2025-12-30 21:26:06 +08:00
essos
b03a4f3e3d
[CI]【Hackathon 9th Sprint No.46】NO.46 功能模块 fastdeploy/model_executor/guided_decoding/xgrammar_backend.py 单测补充 ( #5042 )
...
* test
* rename ut
* remove test max_rollback_tokens
* update
* 精简代码
* fix: torch use mock
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-30 17:05:26 +08:00
chen
0bcf924e10
[Optimization] Optimization for gather_logprob by 10GB ( #5817 )
...
* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k
2025-12-30 15:33:34 +08:00
lizexu123
44a13e4557
[Feature] support w4afp8 v1_loader and v0_loader(tp>1) ( #5757 )
...
* support
* fix
* support w4afp8 v1_loader and v0_loader
* fix
* fix test
* fix test
* fix test
* fix moe.py
* add test_ernie_4_5_w4afp8
* add test
* delete tensor
* fix test
* fix
* add
* fix test
2025-12-30 14:11:52 +08:00
GoldPancake
e78e22ebd5
[BugFix] Fix entropy bugs ( #5818 )
...
* fix entropy bugs
* fix ut
* fix
2025-12-29 20:44:29 -08:00
周周周
7ae13b2326
[PD Disaggregation]remove unsed para in RDMACommManager ( #5814 )
2025-12-30 11:38:30 +08:00
Yonghua Li
a8d3e3ba12
[BugFix] fix shm opened but not closed in set_data_ipc ( #5826 )
2025-12-29 23:35:07 +08:00
CSWYF3634076
9286403570
[Models] Add Qwen3-VL Model Support ( #5763 )
...
* support v1 loader
* remove useless code
* remove useless
* [Model] support Qwen3VL images success
* [Model] support Qwen3VL rope_3d
* [Model] support Qwen3VL remove log
* [Model] support Qwen3VL RL
* [Model] support Qwen3VL tp
* [Model] support Qwen3VL video
* [Model] support Qwen3VL fix ernievl
* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds
* [Model] support Qwen3VL fix multi card
* [Model] support Qwen3VL file close
* [Model] support Qwen3VL fix ce
* [Model] support Qwen3VL fix unittest
* [Model] support Qwen3VL add unittest
---------
Co-authored-by: Ayakouji <yuhongh@qq.com >
2025-12-29 17:39:33 +08:00
essos
ffb3ccff74
[CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 ( #5047 )
...
* add test
* update test
* 精简代码
* 去除 mock
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-29 13:44:56 +08:00
xunyoyo
7e39560a42
[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new ( #5726 )
...
* Add cache messager coverage tests
* Add default_dtype parameter to test cache manager
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-29 13:42:27 +08:00
essos
8ee055aafc
[CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 ( #5050 )
...
* Add comprehensive unit tests for data type conversion functionality
* fix
* Fix unit test failures in test_local_scheduler.py
* update
* fix code
* update mock
* add ut
* rm file
* update test
* 删除已覆盖的测试用例
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-29 12:41:50 +08:00
ddchenhao66
56a9ecccb2
[XPU] xpu support ep4tp4 ( #5773 )
...
* [XPU] xpu support ep4tp4
* Add commands to check multiprocessing and fastdeploy processes
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-12-29 11:27:01 +08:00
YuBaoku
c3ccfa974c
[CI] Fix path error and port conflict ( #5803 )
2025-12-27 12:50:58 +08:00
kxz2002
cad2932990
[BugFix] Fix process_response_dict to support async in serving_completion ( #5758 )
...
* support process_response_dict async initial commit
* fixbug
* add unit test
* optimize
2025-12-26 17:40:58 +08:00
kevin
894f4e312b
[FDConfig] disable chunked_mm_input in ernie5 ( #5774 )
...
* disable chunked_mm_input in ernie5
* update code
* update code
* update test case
* update testcase
* upate case
2025-12-26 15:31:27 +08:00
yzwu
7b6cc11952
[Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE ( #5735 )
2025-12-26 14:01:27 +08:00
YuBaoku
4c22a5afb8
[CI] Disable GPU cleanup due to CI machine limitations ( #5781 )
2025-12-26 00:11:06 +08:00
kevin
4fa76296d9
[BugFix] fix mm splitwise scheduler bug ( #5604 )
...
* fix mm splitwise scheduler bug
* fix test case bug
* update code
* update code
2025-12-25 04:08:11 -08:00
Copilot
1cbf448178
[Feature] Add startup version check mechanism for Paddle ( #5769 )
...
* Initial plan
* 实现版本检查机制:添加get_version_info函数并在启动时检查Paddle版本
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* 修复代码审查反馈:改进错误处理和日志记录
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Change comments and warning messages from Chinese to English
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Update fastdeploy/__init__.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-12-25 19:29:04 +08:00
freeliuzc
9018ccf74e
[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes ( #5738 )
...
* fix attn_mask_offset in mtp with multi-step and pd-split-mode
* fix xpu operater register
* update pmtp multi-step mtp strategy in d-split -mode
* add note
* fix xpu register
2025-12-25 01:54:59 -08:00
YuBaoku
7247dc5f3a
[CI] Add retry and robust cleanup for removal ( #5725 )
...
* [CI] Add retry and robust cleanup for removal
* [CI] Ensure clean GPU memory by killing leftover processes
2025-12-25 17:08:27 +08:00
Juncai
412867fd99
[Feature] Support KV Cache Storage ( #5571 )
...
* Support Mooncake Store
* up
* up
* add op
* fix conflict
* fix error
* up for comments
* avoid thread lock
* up
* fix unittest
* fix unittest
* remove debug info
* consider tp_size > 1
* add default rdma_nics
* add utils
* up
* fix error
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-25 16:30:35 +08:00
memoryCoderC
be3be4913a
[Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM ( #5195 )
...
* [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM
* [Optimization] refactor(chat_handler,completion_handler): rename class
2025-12-25 16:28:15 +08:00
Jiaxin Sui
8fc789bb3f
[iluvatar][CI] refactor iluvatar_ci ( #5588 )
...
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* refactor iluvatar_ci
* Update Docker image tag in iluvatar_test workflow
* Update default Docker image version in workflow
* Update iluvatar_test.yml
* Update default Docker image in workflow config
* Update model path in run_ernie300B_4layer.py
* Update model path in offline inference check
* Add model_data directory and copy model files
Create model_data directory and copy necessary files.
* Update run_ernie_vl_28B.py
* Update run_ernie300B_4layer.py
* Update paddlepaddle installation method in script
* Change wget command to include proxy option
* Modify paddle package installation in CI script
Updated installation commands for paddle packages.
* Update paddlepaddle and paddle-iluvatar-gpu versions
* Delete .github/workflows/ci_iluvatar.yml
* Rename workflow from ILUVATAR Test to ILUVATAR-CI
* Update installation commands for paddlepaddle and iluvatar
2025-12-25 15:10:34 +08:00
YuBaoku
0410c42a9a
[CI] Refactor RL tests to reuse stable_test ( #5516 )
...
* [CI] Refactor RL tests to reuse stable_test
2025-12-24 19:18:00 +08:00
YuBaoku
e75f93d302
[CI] Refactor RL tests to reuse test_metrics ( #5741 )
2025-12-24 17:08:40 +08:00
Divano
6b0fba8294
Update run.sh
2025-12-24 15:35:17 +08:00
Nyakku Shigure
11227e00bb
[GraphOptimization] Wrap deep gemm and triton as python op ( #5673 )
...
* [GraphOptimization] Wrap deep gemm and triton as python op
* add unitest to _base_test && compatibility
* paddle.static.MetaTensor -> "paddle.static.MetaTensor"
* mv register_custom_python_op
* rename yaml
---------
Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com >
2025-12-24 15:23:46 +08:00
bukejiyu
ba4b7afb3a
[Others] Rename tensor_parallel_degree to tensor_model_parallel_size for paddleformers 0.4.1 ( #5727 )
2025-12-23 23:19:11 -08:00