Commit Graph

4336 Commits

Author SHA1 Message Date
周周周 7a0744f05a [UT]support attention test tp (#5887) 2026-01-06 11:15:01 +08:00
Copilot 5c53193c4e [Docs] Update GPU version from 2.3.0 to 2.3.2 in installation documentation (#5894)
* Initial plan

* Update GPU version from 2.3.0 to 2.3.2 in NVIDIA GPU installation documentation

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-06 11:06:32 +08:00
Yuanle Liu 5e729bc2ba [OPs] ep_moe_expert_dispatch.cu dispatch num_experts_per_rank 5 (#5890) 2026-01-06 10:39:35 +08:00
Neil Zhu 272a371635 [Metax] optimize flash attention backend (#5876) 2026-01-06 09:52:09 +08:00
周周周 ab553b3b8b revert cuda_check (#5883) 2026-01-05 20:51:31 +08:00
Jiaxin Sui 2785b820c8 [XPU][CI] Add XPU logprobs case (#5874)
* Enhance run_ci_xpu.sh with caching and prefill options

* Update model path and configuration in run_ci_xpu.sh

* Add '北朝' keyword to assertion in run_45vl.py

* Enhance process termination logic in run_ci_xpu.sh

* Set timeout for CI_XPU job to 60 minutes

* Remove extra newline in stop_processes function

* Update paddlepaddle-xpu installation command

Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.

* Update PaddlePaddle installation command

* Remove max_tokens from model response configuration

Removed max_tokens parameter from the model response call.

* add xpu logprobs case

* Fix formatting and improve setup_logprobs_env

Add newline at end of file and update setup_logprobs_env function.

* Refactor test_logprobs_21b_tp4.py for clarity

* Change top_p value from 1.0 to 0

---------

Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
2026-01-05 19:01:14 +08:00
lizexu123 1d3ae7c024 [BugFix] fix w4afp8 tp=8 (#5868)
* fix w4afp8 tp=8

* fix
2026-01-05 18:59:02 +08:00
tianhaodongbd 6f14b180e3 [RL] Change 'model' to the instance variable 'tmp_model' (#5872) 2026-01-05 02:09:02 -08:00
ming1753 f50e1bcc16 [Others] enable use PFCC deep_ep (#5822)
* upstream deep_ep

* fix bug

* fix bug

* modify env name
2026-01-05 02:07:01 -08:00
jc 8d384f9fd8 [PD Disaggregation] Update usage of pd disaggregation and data parallel (#5742)
* Update usage of pd disaggregation

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up dp docs

* up

* up

* up

* fix unittest
2026-01-05 17:51:29 +08:00
cmcamdy 690d4bcdb0 [XPU] Speculative Decoding with PD (#5856)
* [XPU] Speculative Decoding with PD

* fix post process

* share kv cache sender

* support speculate decoding step system cache

* support speculate decoding step system cache

---------

Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com>
2026-01-05 17:31:03 +08:00
chen ac39c0f887 support fa3 qwen-vl rope (#5869) 2026-01-05 15:29:34 +08:00
sunxin adb91dcacc [BugFix] Fix wint4 ep issue caused by empty run (#5870) 2026-01-05 14:24:37 +08:00
周周周 dc13344ab8 [Optimization] add del to decrease peak memory in MoE prefill (#5863) 2026-01-05 14:01:48 +08:00
jc e911ac2ce7 [BugFix] Refine the preparation of cpu and storage cache (#5777)
* Refine the preparation of cpu and storage cache

* fix error

* fix error

* up

* fix

* up docs

* fix unittest

* remove debug info
2026-01-05 10:13:30 +08:00
jc 95257c1dbd [Feature] RDMACommunicator send key and value scale (#5737)
* RDMACommunicator send key and value scale
---------

Co-authored-by: kevin <chengyf112@gmail.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-05 10:04:24 +08:00
Copilot 7d5282e158 [APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT (#5865)
* Initial plan

* Add configurable FD_WORKER_ALIVE_TIMEOUT environment variable

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Add test for FD_WORKER_ALIVE_TIMEOUT environment variable

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Update docs/zh/usage/environment_variables.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/usage/environment_variables.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Improve test coverage to validate integration with check_health calls

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Remove test_worker_alive_timeout.py per reviewer feedback

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-05 09:47:12 +08:00
YuBaoku 37a128e240 [CI] Fix reusable workflow output mapping in _build_linux_rl.yml
Fix incorrect job reference in reusable workflow outputs, which caused the RL wheel path to be dropped.
2026-01-04 21:22:07 +08:00
Yonghua Li 5e4e6692a4 [BugFix] fix cache manager not launched in case of mtp or blockwise fp8 (#5840)
* [BugFix] fix cache manager not launched in case of mtp or blockwise fp8

* [fix] fix mtp cache in mtp.py

* [fix] fix gpu ops import

* [fix] fix mtp layer idx

* [fix] fix xpu model runner mtp cache

* [fix] fix mtp import
2026-01-04 04:35:37 -08:00
YuBaoku 55f77e9ab1 [CI] Add commit-level build_linux task for RL (#5857) 2026-01-04 20:31:27 +08:00
Zhang Yulong 2da32f2a35 Update benchmark_serving.py (#5861) 2026-01-04 20:07:56 +08:00
kevin 52dc9a7b85 [BugFix] skip mm revert (#5848)
* skip mm revert

* update code

* update test
2026-01-04 14:25:45 +08:00
周周周 e3957a5ebc [Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620) 2026-01-04 11:21:15 +08:00
MingkunZhang f732d7d2ad [Metax] adapt prefix caching & cpu swap (#5844)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2025-12-31 17:02:48 +08:00
chen 193886e745 only cuda run triton op (#5846) 2025-12-31 14:17:31 +08:00
GoldPancake 4e10ae5d99 [Speculative Decoding] Optimize draft logprob (#5842)
* optimize draft logprob

* fix ut
2025-12-31 13:35:56 +08:00
ddchenhao66 9e45ef7ca9 [XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL (#5831) 2025-12-31 09:49:12 +08:00
kevin 74e162697f eb5 mm skip prefix cache (#5838) 2025-12-30 05:30:48 -08:00
xjkmfa ed60b4da32 [CI case]Prompt logprob (#5835)
* [ci case]prompt_logprobs
2025-12-30 21:26:06 +08:00
Sunny-bot1 598d292a69 w4afp8 fix quant (#5830) 2025-12-30 21:16:13 +08:00
essos b03a4f3e3d [CI]【Hackathon 9th Sprint No.46】NO.46 功能模块 fastdeploy/model_executor/guided_decoding/xgrammar_backend.py 单测补充 (#5042)
* test

* rename ut

* remove test max_rollback_tokens

* update

* 精简代码

* fix: torch use mock

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-30 17:05:26 +08:00
chen 0bcf924e10 [Optimization] Optimization for gather_logprob by 10GB (#5817)
* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k
2025-12-30 15:33:34 +08:00
YuBaoku 98519ee2e9 [CI] Fix archive URL injection in tag image build (#5828) 2025-12-30 14:28:17 +08:00
lizexu123 44a13e4557 [Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757)
* support

* fix

* support w4afp8 v1_loader and v0_loader

* fix

* fix test

* fix test

* fix test

* fix moe.py

* add test_ernie_4_5_w4afp8

* add test

* delete tensor

* fix test

* fix

* add

* fix test
2025-12-30 14:11:52 +08:00
GoldPancake e78e22ebd5 [BugFix] Fix entropy bugs (#5818)
* fix entropy bugs

* fix ut

* fix
2025-12-29 20:44:29 -08:00
tianhaodongbd edb9647422 [RL] add lm_head_fp32 in RolloutModelConfig (#5825) 2025-12-29 20:22:30 -08:00
周周周 7ae13b2326 [PD Disaggregation]remove unsed para in RDMACommManager (#5814) 2025-12-30 11:38:30 +08:00
Yonghua Li a8d3e3ba12 [BugFix] fix shm opened but not closed in set_data_ipc (#5826) 2025-12-29 23:35:07 +08:00
CSWYF3634076 deb9698ac5 remove invalid elif branch (#5821) 2025-12-29 19:21:28 +08:00
CSWYF3634076 9286403570 [Models] Add Qwen3-VL Model Support (#5763)
* support v1 loader

* remove useless code

* remove useless

* [Model] support Qwen3VL images success

* [Model] support Qwen3VL rope_3d

* [Model] support Qwen3VL remove log

* [Model] support Qwen3VL RL

* [Model] support Qwen3VL tp

* [Model] support Qwen3VL video

* [Model] support Qwen3VL fix ernievl

* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds

* [Model] support Qwen3VL fix multi card

* [Model] support Qwen3VL file close

* [Model] support Qwen3VL fix ce

* [Model] support Qwen3VL fix unittest

* [Model] support Qwen3VL add unittest

---------

Co-authored-by: Ayakouji <yuhongh@qq.com>
2025-12-29 17:39:33 +08:00
周周周 a3f0696e35 [BugFix] fix compile error in sm89 (#5809) 2025-12-29 16:55:52 +08:00
Ryan eb782a0225 [BugFix] Fix return value inconsistency for ep_moe_expert_combine op (#5812) 2025-12-29 16:44:00 +08:00
essos ffb3ccff74 [CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 (#5047)
* add test

* update test

* 精简代码

* 去除 mock

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-29 13:44:56 +08:00
xunyoyo 7e39560a42 [CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new (#5726)
* Add cache messager coverage tests

* Add default_dtype parameter to test cache manager

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-29 13:42:27 +08:00
Longzhi Wang 11329ee35e [Model] support mode config for expert_dispatch (#5748) 2025-12-29 13:37:20 +08:00
essos 8ee055aafc [CI]【Hackathon 9th Sprint No.55】NO.55 功能模块 fastdeploy/scheduler/local_scheduler.py 单测补充 (#5050)
* Add comprehensive unit tests for data type conversion functionality

* fix

* Fix unit test failures in test_local_scheduler.py

* update

* fix code

* update mock

* add ut

* rm file

* update test

* 删除已覆盖的测试用例

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2025-12-29 12:41:50 +08:00
ddchenhao66 56a9ecccb2 [XPU] xpu support ep4tp4 (#5773)
* [XPU] xpu support ep4tp4

* Add commands to check multiprocessing and fastdeploy processes

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-29 11:27:01 +08:00
chenjian 91a2b13676 [BugFix] Fix preemption out of real_bsz (#5805) 2025-12-29 09:52:36 +08:00
YuBaoku c3ccfa974c [CI] Fix path error and port conflict (#5803) 2025-12-27 12:50:58 +08:00
Nyakku Shigure da9ea88a3b [BugFix] Correct condition for reversed_window_indices in SiglipEncoder (#5795) 2025-12-26 19:16:07 +08:00