YuBaoku
5218d40af6
[CI] Add clang-format 13.0.0 recommendation to pre_commit.sh
2026-01-08 21:47:19 +08:00
GoldPancake
e41d434548
[Bugfix] Fix entropy calculation bugs ( #5941 )
...
* fix entropy bugs
2026-01-08 20:57:35 +08:00
Jiang-Jia-Jun
b9663e5c89
Revise Pull Request guidelines and language section
...
Updated instructions for Pull Request titles and descriptions, changed language section to 'Others', and added notes on code style and pre-commit usage.
2026-01-08 19:26:05 +08:00
Copilot
6825903559
[BugFix] Fix misleading logging in worker_process for request counting ( #5939 )
...
* Initial plan
* Optimize logging in worker_process to accurately reflect request types
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Address feedback: rename to max_occupied_batch_index and simplify logging
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Improve comment clarity for batch request counting
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Fix code style: reorder imports with isort
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-08 16:36:22 +08:00
xiaoluomi
2bb838fed9
[TSP] last_norm allgather move to model.py ( #5924 )
...
* support_lastnorm_gather_split_dev
* support_lastnorm_gather_split_dev1
* support_lastnorm_gather_split_dev3
* support_lastnorm_gather_split_dev4
* support_lastnorm_gather_split_dev5
2026-01-07 23:36:33 -08:00
Bingoo
8e11d719f3
add flashinfer-python-paddle depend ( #5912 )
...
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-01-08 15:08:35 +08:00
GoldPancake
a1fc4e249e
[Bugfix] Fix mtp logprob hang problem when include stop_seq ( #5927 )
...
* fix mtp logprob hang when include stop_seq
2026-01-08 14:21:24 +08:00
Jiaxin Sui
dc170e3005
[XPU][CI]Update CI workflow to include all file types ( #5943 )
...
Removed paths-ignore for markdown and text files.
2026-01-08 12:03:26 +08:00
FocusLuo
decbbb3933
[INTEL HPU] support only one release package of PaddleCustomDevice ( #5910 )
...
Signed-off-by: Luo, Focus <focus.luo@intel.com >
2026-01-08 11:57:13 +08:00
CSWYF3634076
d8fcb7c07d
[Models] Add Qwen3-VL Moe Model Support ( #5913 )
...
* [Model] add Qwen3vl moe model support
* [Model] add Qwen3vl moe model support remove log
* [Model] add Qwen3vl moe model support unittest
2026-01-08 11:36:42 +08:00
Daci
d8c6ba61f3
[BugFix] resource_manager_v1 lock PD ( #5616 )
...
* bugfix resource_manager_v1 lock PD
* with lock add_prefilled_request
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-08 10:02:54 +08:00
YuBaoku
5088d4acdb
[CI] Add daily build_linux jobs for CUDA 12.9 ( #5936 )
...
To extend the daily CI coverage by adding Linux build jobs for CUDA 12.9.
2026-01-07 23:20:11 +08:00
FocusLuo
64f910553e
[INTEL_HPU] supported ERNIE-4.5-21B-A3B-Thinking ( #5891 )
...
ERNIE-4.5-21B-A3B-Thinking needs to use DefaultModelLoaderV1 mode
reference command line:
ENABLE_V1_KVCACHE_SCHEDULER=1 FD_ENC_DEC_BLOCK_NUM=8 HPU_PERF_BREAKDOWN_SYNC_MODE=1 \
HPU_WARMUP_BUCKET=0 MAX_PREFILL_NUM=1 FD_ATTENTION_BACKEND=HPU_ATTN \
python -m fastdeploy.entrypoints.openai.api_server --model \
./models--baidu--ERNIE-4.5-21B-A3B-Thinking/snapshots/4341bb42644d5422859509fa25d41544c57181f8/ \
--port 8388 --engine-worker-queue-port 8302 --metrics-port 8301 \
--cache-queue-port 8303 --max-model-len 16384 --tensor-parallel-size 1 \
--load-choices "default_v1" --num-gpu-blocks-override 5000 --kv-cache-ratio 0.5 \
--max-num-seqs 128 --block-size 64 --no-enable-prefix-caching \
--graph-optimization-config '{"use_cudagraph":false}'
Signed-off-by: Luo, Focus <focus.luo@intel.com >
2026-01-07 21:31:53 +08:00
mouxin
0a92e96f20
[Feature] Add Golang-based Router for Request Scheduling and Load Balancing ( #5882 )
...
* [Feature] add golang router
* [Feature] add golang router
* [Feature] add golang router
* [Feature] add golang router
* [Feature] add golang router
* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing
* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing
* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing
* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-01-07 21:28:08 +08:00
chenjian
925e7edd3c
[Bug fix] Limit multi-modal request to 1 ( #5901 )
2026-01-07 20:25:07 +08:00
lizhenyun01
2be8656c29
[BugFix] fix mtp split kv attetion ( #5920 )
...
* [BugFix] fix mtp split kv attetion
* clean code
* clean code
2026-01-07 04:07:31 -08:00
chenjian
c883a2d3ec
[Optimization] Reduce preemption occurrence when blocks not enough ( #5696 )
...
* [Optimize] Reduce preemption occurrence when blocks not enough for decoding
* fix
* fix
* fix spell
* optimize performance
* fix
2026-01-07 20:01:16 +08:00
xunyoyo
78adf83549
[CI] 【Hackathon 9th Sprint No.18】NO.18 功能模块单测补充 -new ( #5717 )
...
* Remove paddle import guards from DeepEP tests
* Sort imports in DeepEP tests
* Refactor assertions for combine handle in test_ep.py
Updated assertions to verify combine handle in DeepEPEngine.
* Add moe_select coverage in DeepEP tests
* Refactor assertions for combine handle in test_ep
* Strengthen moe_select assertions in DeepEP tests
* Update test_ep.py
---------
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-01-07 17:20:59 +08:00
Ryan
3e74bacc5e
add m_grouped_gemm_fp8_fp8_bf16_nt_contiguous_custom_python_op ( #5847 )
2026-01-07 16:17:55 +08:00
kevin
eabd01cd21
[BugFix] fix eb5 prefix bug ( #5879 )
...
* fix eb5 prefix bug
* update ci test
* update code
* update code
* update code
* update code
* update code
* update code
* update code
2026-01-06 23:50:39 -08:00
kevin
a76e8ae40c
[Feature] support rdma pd dy-c8 ( #5788 )
...
* add rdma pd dy-c8
* update code
2026-01-07 14:55:25 +08:00
周周周
f15df1ec89
Revert cuda check ( #5915 )
...
* commit
* commit
2026-01-07 14:40:18 +08:00
yzwu
29898372e9
[Iluvatar] remove CUDA_VISIBLE_DEVICE in run_ci_iluvatar.sh ( #5916 )
2026-01-07 14:10:47 +08:00
Jiang-Jia-Jun
15179ab730
Revise language guidelines for PR reviews
...
Updated language instructions for PR comments.
2026-01-07 13:34:02 +08:00
yangjianfengo1
59523b27de
opt w4afp8 ( #5853 )
2026-01-07 12:22:35 +08:00
sunxin
6ee8241521
[V1 Loader] Support loading static C8 scale JSON ( #5909 )
...
* v1 loader: support loading static C8 scale JSON
* update
2026-01-06 19:49:30 -08:00
MingkunZhang
7ad5737560
[Metax] adapt to gemm interface on different versions of maca ( #5905 )
...
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com >
2026-01-07 10:02:24 +08:00
fmiao2372
1ee285c2d6
[Intel HPU] enable chunked prefill ( #5903 )
...
* [Intel HPU] enable chunked prefill
* fix bug by copilot comments
2026-01-06 21:01:50 +08:00
周周周
83ae59431e
[BugFix] fix BatchMLAWithPagedKVCacheKernel O_tmp ( #5895 )
2026-01-06 15:39:06 +08:00
ddchenhao66
733014bf32
[XPU] Support EP4TP1 in pd disaggregation ( #5860 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-06 15:25:36 +08:00
gaoziyuan
e99ec4c9d5
[Bugfix]fix model weight signal tensor num ( #5900 )
2026-01-06 14:36:59 +08:00
Yonghua Li
9445fbe054
[KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled ( #5871 )
...
* [fix] temporarily forbid cpu cache in update/clear api
* [fix] stop launching cache transfer manager unless hierarchical cache is enabled
* [fix] fix no attr hierarchical cache
* [fix] fix ci
* [fix] fix test_prefix_cache_manager.py
2026-01-06 14:27:47 +08:00
Yonghua Li
9fc2400e71
[BugFix] fix mtp cache attaching for pd disaggregation ( #5884 )
...
* [fix] fix mtp cache attaching for pd disaggregation
* [fix] fix test_mtp_proposer.py
2026-01-06 14:17:53 +08:00
jc
e9b25aa72f
[BugFix] Storage backend gets env params ( #5892 )
...
* Storage backend gets env params
* up
* up
* up
2026-01-06 14:14:17 +08:00
lizexu123
acdf0cd1d9
fix hadamard_block_size ( #5888 )
2026-01-06 14:12:14 +08:00
qwes5s5
b3ca7f041a
[BugFix] Fix redundant prompt_logprobs in the second chunk of streaming response when return_token_ids is enabled for v1/completions and fix trace file name ( #5829 )
...
* fix prompt logprobs bug
* fix trace file name
---------
Co-authored-by: qwes5s5 <root@yq01-sys-rpm26xc1knu.yq01.baidu.com >
2026-01-06 14:11:43 +08:00
freeliuzc
ca574119e5
support multi-step draft-model with cudagraph ( #5886 )
2026-01-06 11:16:21 +08:00
周周周
7a0744f05a
[UT]support attention test tp ( #5887 )
2026-01-06 11:15:01 +08:00
Copilot
5c53193c4e
[Docs] Update GPU version from 2.3.0 to 2.3.2 in installation documentation ( #5894 )
...
* Initial plan
* Update GPU version from 2.3.0 to 2.3.2 in NVIDIA GPU installation documentation
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-06 11:06:32 +08:00
Yuanle Liu
5e729bc2ba
[OPs] ep_moe_expert_dispatch.cu dispatch num_experts_per_rank 5 ( #5890 )
2026-01-06 10:39:35 +08:00
Neil Zhu
272a371635
[Metax] optimize flash attention backend ( #5876 )
2026-01-06 09:52:09 +08:00
周周周
ab553b3b8b
revert cuda_check ( #5883 )
2026-01-05 20:51:31 +08:00
Jiaxin Sui
2785b820c8
[XPU][CI] Add XPU logprobs case ( #5874 )
...
* Enhance run_ci_xpu.sh with caching and prefill options
* Update model path and configuration in run_ci_xpu.sh
* Add '北朝' keyword to assertion in run_45vl.py
* Enhance process termination logic in run_ci_xpu.sh
* Set timeout for CI_XPU job to 60 minutes
* Remove extra newline in stop_processes function
* Update paddlepaddle-xpu installation command
Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.
* Update PaddlePaddle installation command
* Remove max_tokens from model response configuration
Removed max_tokens parameter from the model response call.
* add xpu logprobs case
* Fix formatting and improve setup_logprobs_env
Add newline at end of file and update setup_logprobs_env function.
* Refactor test_logprobs_21b_tp4.py for clarity
* Change top_p value from 1.0 to 0
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com >
2026-01-05 19:01:14 +08:00
lizexu123
1d3ae7c024
[BugFix] fix w4afp8 tp=8 ( #5868 )
...
* fix w4afp8 tp=8
* fix
2026-01-05 18:59:02 +08:00
tianhaodongbd
6f14b180e3
[RL] Change 'model' to the instance variable 'tmp_model' ( #5872 )
2026-01-05 02:09:02 -08:00
ming1753
f50e1bcc16
[Others] enable use PFCC deep_ep ( #5822 )
...
* upstream deep_ep
* fix bug
* fix bug
* modify env name
2026-01-05 02:07:01 -08:00
jc
8d384f9fd8
[PD Disaggregation] Update usage of pd disaggregation and data parallel ( #5742 )
...
* Update usage of pd disaggregation
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up dp docs
* up
* up
* up
* fix unittest
2026-01-05 17:51:29 +08:00
cmcamdy
690d4bcdb0
[XPU] Speculative Decoding with PD ( #5856 )
...
* [XPU] Speculative Decoding with PD
* fix post process
* share kv cache sender
* support speculate decoding step system cache
* support speculate decoding step system cache
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com >
2026-01-05 17:31:03 +08:00
chen
ac39c0f887
support fa3 qwen-vl rope ( #5869 )
2026-01-05 15:29:34 +08:00
sunxin
adb91dcacc
[BugFix] Fix wint4 ep issue caused by empty run ( #5870 )
2026-01-05 14:24:37 +08:00