YuBaoku
65c6e726f5
[Cherry-Pick][Docs] Update Release Note( #7302 ) ( #7341 )
2026-04-11 16:48:06 +08:00
YuBaoku
2ac9b89409
[XPU][CI]Update xtdk version in download_dependencies.sh ( #7320 ) ( #7322 )
...
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-04-11 00:27:54 +08:00
GoldPancake
c7560383ab
[Cherry-Pick][FDConfig] Auto-scale CUDA Graph Capture & CLI Quantization Params + CUDAGraph Validation (#7215,#7281) ( #7301 )
...
* refactor cudagraph args
* refactor quant cli param
* fix
* fix
* tmp skip xpu
* fix
2026-04-10 16:10:31 +08:00
zhangbo9674
4f36346e14
[Cherry-Pick] change rms norm for glm #7269 ( #7276 )
...
* fix
* refine code
* refine code
* refine code
* refine code
* refine code
2026-04-10 01:03:00 -07:00
YuBaoku
dd0863b076
[BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bound bug ( #7221 ) ( #7296 )
...
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
2026-04-10 13:54:02 +08:00
fxyfxy777
dea9d35171
[OP]Unify MoE op with moe_permute path for bf16 GLM ( #7164 ) ( #7279 )
2026-04-09 21:37:42 +08:00
YuBaoku
921a0ae60b
[Docs] Update docs for release/2.5 ( #7267 ) ( #7277 )
...
* Update docs for release/2.5
* Update English docs for release/2.5
- Update README_EN.md: add v2.5 news entry, reformat v2.4 entry with release link
- Update docs/get_started/installation/nvidia_gpu.md:
- Docker image: 2.4.0 -> 2.5.0, notice now shows SM80/86/89/90 support
- paddlepaddle-gpu: 3.3.0 -> 3.3.1, add CUDA 12.9 alternatives
- fastdeploy-gpu: 2.4.0 -> 2.5.0, unified arch install with CUDA 12.9 option
- Update docs/zh/get_started/installation/nvidia_gpu.md:
- Fix remaining paddlepaddle-gpu==3.3.0 refs in sections 4&5 -> 3.3.1
Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/fa0be381-324e-4b0d-b7a6-e2c1fa12174f
* Clarify --extra-index-url usage in installation docs
Add note explaining that --extra-index-url is only for downloading
fastdeploy-gpu dependencies; fastdeploy-gpu itself must be installed
from the Paddle source specified by -i. Applied to both Chinese and
English nvidia_gpu.md installation guides.
Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/9fa8b3c9-7555-4eae-b9b9-026cddd7e74c
* Update nvidia_gpu.md
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-04-09 21:03:19 +08:00
Jiaxin Sui
6fcc25f3f6
Update ci_metax.yml ( #7286 )
2026-04-09 17:31:20 +08:00
Bingoo
849eb3df65
[Cherry-Pick][Optimization] merge matmul and add (#6986) ( #7191 )
...
* merge matmul and add
* modify format
* using paddle.nn.functional.linear
* using _C_ops.linear
* using paddle.nn.functional.linear
* add FLAGS_use_legacy_linear env var in test case
* fix format
* add assert and remove env
* modify format
* using matmul for no bias
* modify accurate baseline
2026-04-09 14:15:43 +08:00
YuBaoku
098dd2c251
[XPU][CI] lock xvllm version for fix bug ( #7264 ) ( #7266 )
...
* Remove duplicate NICs from environment variables
* Update version for xvllm in download_dependencies.sh
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-04-09 12:46:13 +08:00
xiaoxiaohehe001
5fd8020363
[Cherry-Pick][BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn ( #7216 )
2026-04-09 11:05:43 +08:00
JYChen
9c65655cb3
[Cherry-Pick][RL] support moe-topk use topk_reduce_func #7218 ( #7256 )
...
* support moe-topk use topk_reduce_func
* fix ep error
* fix ut
* fix ut
2026-04-09 11:01:10 +08:00
Bingoo
01818844b4
support moe for sm103 ( #7240 )
2026-04-08 20:56:23 +08:00
YuBaoku
84d62712c9
[Feature]distinguish whl version ( #7204 ) ( #7224 )
...
* [Feature]whl version
* [Feature]whl version,set root_is_pure = false
* [Feature]code style
Co-authored-by: ChowMingSing <610208940@qq.com >
2026-04-08 17:32:38 +08:00
YuBaoku
6b78981dde
Split enable_mm ( #7183 ) ( #7233 )
...
Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com >
Co-authored-by: liuruian <liuruian@MacBook-Pro.local >
2026-04-08 16:32:04 +08:00
GoldPancake
403ce139c7
remove arctic_inference deps ( #7236 )
2026-04-08 15:25:21 +08:00
huicongyao
36909bf27d
[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap( #7172 ) ( #7192 )
...
* fix MTP bugs in TP and overlap
* fix
2026-04-08 10:24:38 +08:00
YuBaoku
7ab48c4760
[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml ( #7186 ) ( #7195 )
2026-04-03 20:55:53 +08:00
Yonghua Li
55dbc83310
[Cherry-Pick][BugFix] prevent requests from entering running state without a slot( #7141 ) ( #7181 )
...
* [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (#7163 )
* Set MC_MAX_MR_SIZE to avoid register hang
* up
* [fix] prevent requests from entering running state without a slot
* [fix] count abort set
* [fix] count preempted task in waiting list
---------
Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com >
2026-04-03 17:46:13 +08:00
Jiang-Jia-Jun
b24765a746
Update setup.py
2026-04-03 11:29:22 +08:00
jackyYang6
e3aed6de2f
fix oom bug, optimize async weight loading and update read step by yaml ( #7171 )
2026-04-03 11:05:24 +08:00
jc
1cc0cf23c2
[BugFix] Set MC_MAX_MR_SIZE to avoid register hang in default ( #7161 )
...
* Set MC_MAX_MR_SIZE to avoid register hang
* Set MC_MAX_MR_SIZE to avoid register hang
2026-04-03 10:51:15 +08:00
chenjian
2632e6cf32
[Feature] Support chunk prefill disabled in scheduler v1 ( #7152 )
2026-04-03 10:18:14 +08:00
luukunn
562fa31791
[BugFix]fix extract_tool_calls ( #7154 )
...
* fix extract_tool_calls
2026-04-02 21:18:37 +08:00
Yonghua Li
98f3fc9267
[RL] [KVCache] let cache transfer managers update key prefix after weight update and add unit tests ( #7083 )
...
* [test] add a few unit tests
* [feat] update key prefix when model weights are updated
* [test] try to fix test_worker_process
2026-04-02 19:58:41 +08:00
fxyfxy777
9f3b3ce7f5
[Optimization] merge_allreduce ( #7039 )
2026-04-02 19:52:13 +08:00
bukejiyu
f142b486c9
update ( #7101 )
2026-04-02 16:07:26 +08:00
Longzhi Wang
938e7dd881
[Other] support video_fps args for video bench ( #7077 )
2026-04-02 10:40:15 +08:00
YuBaoku
7aa213bba9
[CI] Replace ipc=host with shm-size and sysctl configuration ( #7138 )
2026-04-02 10:33:55 +08:00
YuBaoku
db808f2080
[CI] Optimize log cleanup and isolation in unittest ( #7132 )
2026-04-01 22:07:55 +08:00
Yuanle Liu
1af7f80811
Revert "[BugFix][Speculative Decoding] Correct index calculation in speculate…" ( #7133 )
...
This reverts commit ba1aa1edff .
2026-04-01 06:54:23 -07:00
luukunn
fa7a84926d
[Optimization]Fix tool parser ( #7079 )
...
* fix tool parser
2026-04-01 21:20:34 +08:00
Bingoo
410988d9ec
[OP] support deepgeem for sm103 ( #7073 )
...
* support deepgeem for sm103
* add assert
* modify code style
* add assert
* modify sm version condition
* remove assert
2026-04-01 21:01:09 +08:00
lonelygsh
ba1aa1edff
[BugFix][Speculative Decoding] Correct index calculation in speculate decoding operators ( #7121 )
...
- Fix accept_idx calculation in spec_set_value_by_stop_seqs
- Fix condition check from < to <= for token matching
- Fix accept_tokens indexing logic
- Remove unnecessary -1 in current_step comparison for max_think_len
Co-authored-by: guanshihui] <guanshihui@baidu.com >
2026-04-01 05:36:53 -07:00
cmcamdy
7a2e33098f
[XPU] Refactor pre process ( #6993 )
...
* [XPU] support speculate_pre_process
* merge develop
* fix codestype
* fix mtp, support cu_seqlens_q_output
* fix mtp, support cu_seqlens_q_output
* fix test
---------
Co-authored-by: lizan1999 <lizan03@baidu.com >
2026-04-01 20:29:55 +08:00
mouxin
fba8a51ad1
[Feature] Fix mixed cache-aware ( #7129 )
...
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Fix mixed cache-aware
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-04-01 19:29:29 +08:00
Jingfeng Wu
3b564116d5
[Docs] Add docs for disaggregated deployment ( #6700 )
...
* add docs for disaggregated deployment
* pre-commit run for style check
* update docs
2026-04-01 19:27:09 +08:00
yzwu
ceaf5df350
[Iluvatar] Fix cuda graph error for tp > 1 in ernie models ( #7126 )
2026-04-01 19:13:34 +08:00
luukunn
fdfc908e2f
[Others] reuse unit test ( #7127 )
2026-04-01 18:36:00 +08:00
mouxin
6cae9b1f50
[Feature] Config eviction_duration ( #7125 )
...
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-04-01 16:46:21 +08:00
sunxin
c29e86fc9d
[Feature] Support mtp overlap schedule ( #7001 )
2026-04-01 14:24:26 +08:00
YuBaoku
c6f0c5c3a6
[CI] Optimize test execution with single-GPU parallelism ( #7085 )
...
* [CI] Optimize test execution with single-GPU parallelism and log collection
* remove export CUDA_VISIBLE_DEVICES
* fix path error
* fix log_* path and debug
* [CI] Optimize test execution with single-GPU parallelism and log collection
2026-04-01 14:18:40 +08:00
zhouchong
91c832f607
[Feature] Add logging parameters and error output to terminal ( #7098 )
2026-04-01 13:18:42 +08:00
jc
af51fc46d6
[PD Disaggregation] Write the cache of preempted req to storage and refine PD Disaggregation ( #7107 )
...
* Write the cache of preempted req to storage
* up
* fix
2026-04-01 13:15:52 +08:00
luukunn
3651113ee5
[DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR ( #7052 )
...
* remove ENABLE_V1_DATA_PROCESSOR
* fix unit test
* fix unit test
2026-04-01 09:53:41 +08:00
qwes5s5
ee2b965f5f
adjust config info ( #7054 )
2026-03-31 21:26:05 +08:00
Yonghua Li
a3cc3aa777
[BugFix] reset exist tasks signal in clear_data ( #7111 )
...
* [BugFix] reset exist tasks signal in clear_data
* [Fix] fix stale exist tasks signal after weight update
* [Chore] downgrade detected new requests log to DEBUG level
* [fix] adjust continue place
2026-03-31 21:24:08 +08:00
周周周
fd44bb7cbf
cpmmot ( #7105 )
...
Co-authored-by: “liuruian” <liuruian@baidu.com >
2026-03-31 16:13:44 +08:00
cloudforge1
5c5dc66aa7
[CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 ( #6731 )
...
* [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充
* [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充
---------
Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-03-31 15:29:35 +08:00
YilongGuo
dd61e7e421
[Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditionalGeneration ( #7086 )
...
Add clear_grpah_opt_backend method that delegates to the underlying model
to clear cuda graph optimization backend.
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-03-31 13:48:25 +08:00