周周周
7a0744f05a
[UT]support attention test tp ( #5887 )
2026-01-06 11:15:01 +08:00
Copilot
5c53193c4e
[Docs] Update GPU version from 2.3.0 to 2.3.2 in installation documentation ( #5894 )
...
* Initial plan
* Update GPU version from 2.3.0 to 2.3.2 in NVIDIA GPU installation documentation
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-06 11:06:32 +08:00
Yuanle Liu
5e729bc2ba
[OPs] ep_moe_expert_dispatch.cu dispatch num_experts_per_rank 5 ( #5890 )
2026-01-06 10:39:35 +08:00
Neil Zhu
272a371635
[Metax] optimize flash attention backend ( #5876 )
2026-01-06 09:52:09 +08:00
周周周
ab553b3b8b
revert cuda_check ( #5883 )
2026-01-05 20:51:31 +08:00
Jiaxin Sui
2785b820c8
[XPU][CI] Add XPU logprobs case ( #5874 )
...
* Enhance run_ci_xpu.sh with caching and prefill options
* Update model path and configuration in run_ci_xpu.sh
* Add '北朝' keyword to assertion in run_45vl.py
* Enhance process termination logic in run_ci_xpu.sh
* Set timeout for CI_XPU job to 60 minutes
* Remove extra newline in stop_processes function
* Update paddlepaddle-xpu installation command
Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.
* Update PaddlePaddle installation command
* Remove max_tokens from model response configuration
Removed max_tokens parameter from the model response call.
* add xpu logprobs case
* Fix formatting and improve setup_logprobs_env
Add newline at end of file and update setup_logprobs_env function.
* Refactor test_logprobs_21b_tp4.py for clarity
* Change top_p value from 1.0 to 0
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com >
2026-01-05 19:01:14 +08:00
lizexu123
1d3ae7c024
[BugFix] fix w4afp8 tp=8 ( #5868 )
...
* fix w4afp8 tp=8
* fix
2026-01-05 18:59:02 +08:00
tianhaodongbd
6f14b180e3
[RL] Change 'model' to the instance variable 'tmp_model' ( #5872 )
2026-01-05 02:09:02 -08:00
ming1753
f50e1bcc16
[Others] enable use PFCC deep_ep ( #5822 )
...
* upstream deep_ep
* fix bug
* fix bug
* modify env name
2026-01-05 02:07:01 -08:00
jc
8d384f9fd8
[PD Disaggregation] Update usage of pd disaggregation and data parallel ( #5742 )
...
* Update usage of pd disaggregation
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up dp docs
* up
* up
* up
* fix unittest
2026-01-05 17:51:29 +08:00
cmcamdy
690d4bcdb0
[XPU] Speculative Decoding with PD ( #5856 )
...
* [XPU] Speculative Decoding with PD
* fix post process
* share kv cache sender
* support speculate decoding step system cache
* support speculate decoding step system cache
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com >
2026-01-05 17:31:03 +08:00
chen
ac39c0f887
support fa3 qwen-vl rope ( #5869 )
2026-01-05 15:29:34 +08:00
sunxin
adb91dcacc
[BugFix] Fix wint4 ep issue caused by empty run ( #5870 )
2026-01-05 14:24:37 +08:00
周周周
dc13344ab8
[Optimization] add del to decrease peak memory in MoE prefill ( #5863 )
2026-01-05 14:01:48 +08:00
jc
e911ac2ce7
[BugFix] Refine the preparation of cpu and storage cache ( #5777 )
...
* Refine the preparation of cpu and storage cache
* fix error
* fix error
* up
* fix
* up docs
* fix unittest
* remove debug info
2026-01-05 10:13:30 +08:00
jc
95257c1dbd
[Feature] RDMACommunicator send key and value scale ( #5737 )
...
* RDMACommunicator send key and value scale
---------
Co-authored-by: kevin <chengyf112@gmail.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-05 10:04:24 +08:00
Copilot
7d5282e158
[APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT ( #5865 )
...
* Initial plan
* Add configurable FD_WORKER_ALIVE_TIMEOUT environment variable
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Add test for FD_WORKER_ALIVE_TIMEOUT environment variable
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Update docs/zh/usage/environment_variables.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update docs/usage/environment_variables.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Improve test coverage to validate integration with check_health calls
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Remove test_worker_alive_timeout.py per reviewer feedback
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-01-05 09:47:12 +08:00
YuBaoku
37a128e240
[CI] Fix reusable workflow output mapping in _build_linux_rl.yml
...
Fix incorrect job reference in reusable workflow outputs, which caused the RL wheel path to be dropped.
2026-01-04 21:22:07 +08:00
Yonghua Li
5e4e6692a4
[BugFix] fix cache manager not launched in case of mtp or blockwise fp8 ( #5840 )
...
* [BugFix] fix cache manager not launched in case of mtp or blockwise fp8
* [fix] fix mtp cache in mtp.py
* [fix] fix gpu ops import
* [fix] fix mtp layer idx
* [fix] fix xpu model runner mtp cache
* [fix] fix mtp import
2026-01-04 04:35:37 -08:00
YuBaoku
55f77e9ab1
[CI] Add commit-level build_linux task for RL ( #5857 )
2026-01-04 20:31:27 +08:00
Zhang Yulong
2da32f2a35
Update benchmark_serving.py ( #5861 )
2026-01-04 20:07:56 +08:00
kevin
52dc9a7b85
[BugFix] skip mm revert ( #5848 )
...
* skip mm revert
* update code
* update test
2026-01-04 14:25:45 +08:00
周周周
e3957a5ebc
[Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel ( #5620 )
2026-01-04 11:21:15 +08:00
MingkunZhang
f732d7d2ad
[Metax] adapt prefix caching & cpu swap ( #5844 )
...
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com >
2025-12-31 17:02:48 +08:00
chen
193886e745
only cuda run triton op ( #5846 )
2025-12-31 14:17:31 +08:00
GoldPancake
4e10ae5d99
[Speculative Decoding] Optimize draft logprob ( #5842 )
...
* optimize draft logprob
* fix ut
2025-12-31 13:35:56 +08:00
ddchenhao66
9e45ef7ca9
[XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL ( #5831 )
2025-12-31 09:49:12 +08:00
kevin
74e162697f
eb5 mm skip prefix cache ( #5838 )
2025-12-30 05:30:48 -08:00
xjkmfa
ed60b4da32
[CI case]Prompt logprob ( #5835 )
...
* [ci case]prompt_logprobs
2025-12-30 21:26:06 +08:00
Sunny-bot1
598d292a69
w4afp8 fix quant ( #5830 )
2025-12-30 21:16:13 +08:00
essos
b03a4f3e3d
[CI]【Hackathon 9th Sprint No.46】NO.46 功能模块 fastdeploy/model_executor/guided_decoding/xgrammar_backend.py 单测补充 ( #5042 )
...
* test
* rename ut
* remove test max_rollback_tokens
* update
* 精简代码
* fix: torch use mock
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-30 17:05:26 +08:00
chen
0bcf924e10
[Optimization] Optimization for gather_logprob by 10GB ( #5817 )
...
* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k
2025-12-30 15:33:34 +08:00
YuBaoku
98519ee2e9
[CI] Fix archive URL injection in tag image build ( #5828 )
2025-12-30 14:28:17 +08:00
lizexu123
44a13e4557
[Feature] support w4afp8 v1_loader and v0_loader(tp>1) ( #5757 )
...
* support
* fix
* support w4afp8 v1_loader and v0_loader
* fix
* fix test
* fix test
* fix test
* fix moe.py
* add test_ernie_4_5_w4afp8
* add test
* delete tensor
* fix test
* fix
* add
* fix test
2025-12-30 14:11:52 +08:00
GoldPancake
e78e22ebd5
[BugFix] Fix entropy bugs ( #5818 )
...
* fix entropy bugs
* fix ut
* fix
2025-12-29 20:44:29 -08:00
tianhaodongbd
edb9647422
[RL] add lm_head_fp32 in RolloutModelConfig ( #5825 )
2025-12-29 20:22:30 -08:00
周周周
7ae13b2326
[PD Disaggregation]remove unsed para in RDMACommManager ( #5814 )
2025-12-30 11:38:30 +08:00
Yonghua Li
a8d3e3ba12
[BugFix] fix shm opened but not closed in set_data_ipc ( #5826 )
2025-12-29 23:35:07 +08:00
CSWYF3634076
deb9698ac5
remove invalid elif branch ( #5821 )
2025-12-29 19:21:28 +08:00
CSWYF3634076
9286403570
[Models] Add Qwen3-VL Model Support ( #5763 )
...
* support v1 loader
* remove useless code
* remove useless
* [Model] support Qwen3VL images success
* [Model] support Qwen3VL rope_3d
* [Model] support Qwen3VL remove log
* [Model] support Qwen3VL RL
* [Model] support Qwen3VL tp
* [Model] support Qwen3VL video
* [Model] support Qwen3VL fix ernievl
* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds
* [Model] support Qwen3VL fix multi card
* [Model] support Qwen3VL file close
* [Model] support Qwen3VL fix ce
* [Model] support Qwen3VL fix unittest
* [Model] support Qwen3VL add unittest
---------
Co-authored-by: Ayakouji <yuhongh@qq.com >
2025-12-29 17:39:33 +08:00
周周周
a3f0696e35
[BugFix] fix compile error in sm89 ( #5809 )
2025-12-29 16:55:52 +08:00
Ryan
eb782a0225
[BugFix] Fix return value inconsistency for ep_moe_expert_combine op ( #5812 )
2025-12-29 16:44:00 +08:00
essos
ffb3ccff74
[CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 ( #5047 )
...
* add test
* update test
* 精简代码
* 去除 mock
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-29 13:44:56 +08:00
xunyoyo
7e39560a42
[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new ( #5726 )
...
* Add cache messager coverage tests
* Add default_dtype parameter to test cache manager
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-29 13:42:27 +08:00
Longzhi Wang
11329ee35e
[Model] support mode config for expert_dispatch ( #5748 )
2025-12-29 13:37:20 +08:00
essos
8ee055aafc
[CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 ( #5050 )
...
* Add comprehensive unit tests for data type conversion functionality
* fix
* Fix unit test failures in test_local_scheduler.py
* update
* fix code
* update mock
* add ut
* rm file
* update test
* 删除已覆盖的测试用例
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2025-12-29 12:41:50 +08:00
ddchenhao66
56a9ecccb2
[XPU] xpu support ep4tp4 ( #5773 )
...
* [XPU] xpu support ep4tp4
* Add commands to check multiprocessing and fastdeploy processes
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-12-29 11:27:01 +08:00
chenjian
91a2b13676
[BugFix] Fix preemption out of real_bsz ( #5805 )
2025-12-29 09:52:36 +08:00
YuBaoku
c3ccfa974c
[CI] Fix path error and port conflict ( #5803 )
2025-12-27 12:50:58 +08:00
Nyakku Shigure
da9ea88a3b
[BugFix] Correct condition for reversed_window_indices in SiglipEncoder ( #5795 )
2025-12-26 19:16:07 +08:00