Commit Graph

4979 Commits

Author SHA1 Message Date
YuBaoku 65c6e726f5 [Cherry-Pick][Docs] Update Release Note(#7302) (#7341) 2026-04-11 16:48:06 +08:00
YuBaoku 2ac9b89409 [XPU][CI]Update xtdk version in download_dependencies.sh (#7320) (#7322)
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-04-11 00:27:54 +08:00
GoldPancake c7560383ab [Cherry-Pick][FDConfig] Auto-scale CUDA Graph Capture & CLI Quantization Params + CUDAGraph Validation (#7215,#7281) (#7301)
* refactor cudagraph args

* refactor quant cli param

* fix

* fix

* tmp skip xpu

* fix
2026-04-10 16:10:31 +08:00
zhangbo9674 4f36346e14 [Cherry-Pick] change rms norm for glm #7269 (#7276)
* fix

* refine code

* refine code

* refine code

* refine code

* refine code
2026-04-10 01:03:00 -07:00
YuBaoku dd0863b076 [BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bound bug (#7221) (#7296)
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
2026-04-10 13:54:02 +08:00
fxyfxy777 dea9d35171 [OP]Unify MoE op with moe_permute path for bf16 GLM (#7164) (#7279) 2026-04-09 21:37:42 +08:00
YuBaoku 921a0ae60b [Docs] Update docs for release/2.5 (#7267) (#7277)
* Update docs for release/2.5

* Update English docs for release/2.5

- Update README_EN.md: add v2.5 news entry, reformat v2.4 entry with release link
- Update docs/get_started/installation/nvidia_gpu.md:
  - Docker image: 2.4.0 -> 2.5.0, notice now shows SM80/86/89/90 support
  - paddlepaddle-gpu: 3.3.0 -> 3.3.1, add CUDA 12.9 alternatives
  - fastdeploy-gpu: 2.4.0 -> 2.5.0, unified arch install with CUDA 12.9 option
- Update docs/zh/get_started/installation/nvidia_gpu.md:
  - Fix remaining paddlepaddle-gpu==3.3.0 refs in sections 4&5 -> 3.3.1

Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/fa0be381-324e-4b0d-b7a6-e2c1fa12174f



* Clarify --extra-index-url usage in installation docs

Add note explaining that --extra-index-url is only for downloading
fastdeploy-gpu dependencies; fastdeploy-gpu itself must be installed
from the Paddle source specified by -i. Applied to both Chinese and
English nvidia_gpu.md installation guides.

Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/9fa8b3c9-7555-4eae-b9b9-026cddd7e74c



* Update nvidia_gpu.md

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-09 21:03:19 +08:00
Jiaxin Sui 6fcc25f3f6 Update ci_metax.yml (#7286) 2026-04-09 17:31:20 +08:00
Bingoo 849eb3df65 [Cherry-Pick][Optimization] merge matmul and add (#6986) (#7191)
* merge matmul and add

* modify format

* using paddle.nn.functional.linear

* using _C_ops.linear

* using paddle.nn.functional.linear

* add FLAGS_use_legacy_linear env var in test case

* fix format

* add assert and remove env

* modify format

* using matmul for no bias

* modify accurate baseline
2026-04-09 14:15:43 +08:00
YuBaoku 098dd2c251 [XPU][CI] lock xvllm version for fix bug (#7264) (#7266)
* Remove duplicate NICs from environment variables

* Update version for xvllm in download_dependencies.sh

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-04-09 12:46:13 +08:00
xiaoxiaohehe001 5fd8020363 [Cherry-Pick][BugFix] Fix batch_size derivation and relax shape checks in SM90 flash_mask_attn (#7216) 2026-04-09 11:05:43 +08:00
JYChen 9c65655cb3 [Cherry-Pick][RL] support moe-topk use topk_reduce_func #7218 (#7256)
* support moe-topk use topk_reduce_func

* fix ep error

* fix ut

* fix ut
2026-04-09 11:01:10 +08:00
Bingoo 01818844b4 support moe for sm103 (#7240) 2026-04-08 20:56:23 +08:00
YuBaoku 84d62712c9 [Feature]distinguish whl version (#7204) (#7224)
* [Feature]whl version

* [Feature]whl version,set root_is_pure = false

* [Feature]code style

Co-authored-by: ChowMingSing <610208940@qq.com>
2026-04-08 17:32:38 +08:00
YuBaoku 6b78981dde Split enable_mm (#7183) (#7233)
Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com>
Co-authored-by: liuruian <liuruian@MacBook-Pro.local>
2026-04-08 16:32:04 +08:00
GoldPancake 403ce139c7 remove arctic_inference deps (#7236) 2026-04-08 15:25:21 +08:00
huicongyao 36909bf27d [Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172) (#7192)
* fix MTP bugs in TP and overlap

* fix
2026-04-08 10:24:38 +08:00
YuBaoku 7ab48c4760 [Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (#7186) (#7195) 2026-04-03 20:55:53 +08:00
Yonghua Li 55dbc83310 [Cherry-Pick][BugFix] prevent requests from entering running state without a slot(#7141) (#7181)
* [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (#7163)

* Set MC_MAX_MR_SIZE to avoid register hang

* up

* [fix] prevent requests from entering running state without a slot

* [fix] count abort set

* [fix] count preempted task in waiting list

---------

Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com>
2026-04-03 17:46:13 +08:00
Jiang-Jia-Jun b24765a746 Update setup.py 2026-04-03 11:29:22 +08:00
jackyYang6 e3aed6de2f fix oom bug, optimize async weight loading and update read step by yaml (#7171) 2026-04-03 11:05:24 +08:00
jc 1cc0cf23c2 [BugFix] Set MC_MAX_MR_SIZE to avoid register hang in default (#7161)
* Set MC_MAX_MR_SIZE to avoid register hang

* Set MC_MAX_MR_SIZE to avoid register hang
2026-04-03 10:51:15 +08:00
chenjian 2632e6cf32 [Feature] Support chunk prefill disabled in scheduler v1 (#7152) 2026-04-03 10:18:14 +08:00
luukunn 562fa31791 [BugFix]fix extract_tool_calls (#7154)
* fix extract_tool_calls
2026-04-02 21:18:37 +08:00
Yonghua Li 98f3fc9267 [RL] [KVCache] let cache transfer managers update key prefix after weight update and add unit tests (#7083)
* [test] add a few unit tests

* [feat] update key prefix when model weights are updated

* [test] try to fix test_worker_process
2026-04-02 19:58:41 +08:00
fxyfxy777 9f3b3ce7f5 [Optimization] merge_allreduce (#7039) 2026-04-02 19:52:13 +08:00
bukejiyu f142b486c9 update (#7101) 2026-04-02 16:07:26 +08:00
Longzhi Wang 938e7dd881 [Other] support video_fps args for video bench (#7077) 2026-04-02 10:40:15 +08:00
YuBaoku 7aa213bba9 [CI] Replace ipc=host with shm-size and sysctl configuration (#7138) 2026-04-02 10:33:55 +08:00
YuBaoku db808f2080 [CI] Optimize log cleanup and isolation in unittest (#7132) 2026-04-01 22:07:55 +08:00
Yuanle Liu 1af7f80811 Revert "[BugFix][Speculative Decoding] Correct index calculation in speculate…" (#7133)
This reverts commit ba1aa1edff.
2026-04-01 06:54:23 -07:00
luukunn fa7a84926d [Optimization]Fix tool parser (#7079)
* fix tool parser
2026-04-01 21:20:34 +08:00
Bingoo 410988d9ec [OP] support deepgeem for sm103 (#7073)
* support deepgeem for sm103

* add assert

* modify code style

* add assert

* modify sm version condition

* remove assert
2026-04-01 21:01:09 +08:00
lonelygsh ba1aa1edff [BugFix][Speculative Decoding] Correct index calculation in speculate decoding operators (#7121)
- Fix accept_idx calculation in spec_set_value_by_stop_seqs
- Fix condition check from < to <= for token matching
- Fix accept_tokens indexing logic
- Remove unnecessary -1 in current_step comparison for max_think_len

Co-authored-by: guanshihui] <guanshihui@baidu.com>
2026-04-01 05:36:53 -07:00
cmcamdy 7a2e33098f [XPU] Refactor pre process (#6993)
* [XPU] support speculate_pre_process

* merge develop

* fix codestype

* fix mtp, support cu_seqlens_q_output

* fix mtp, support cu_seqlens_q_output

* fix test

---------

Co-authored-by: lizan1999 <lizan03@baidu.com>
2026-04-01 20:29:55 +08:00
mouxin fba8a51ad1 [Feature] Fix mixed cache-aware (#7129)
* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Fix mixed cache-aware

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-04-01 19:29:29 +08:00
Jingfeng Wu 3b564116d5 [Docs] Add docs for disaggregated deployment (#6700)
* add docs for disaggregated deployment

* pre-commit run for style check

* update docs
2026-04-01 19:27:09 +08:00
yzwu ceaf5df350 [Iluvatar] Fix cuda graph error for tp > 1 in ernie models (#7126) 2026-04-01 19:13:34 +08:00
luukunn fdfc908e2f [Others] reuse unit test (#7127) 2026-04-01 18:36:00 +08:00
mouxin 6cae9b1f50 [Feature] Config eviction_duration (#7125)
* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

---------

Co-authored-by: mouxin <mouxin@baidu.com>
2026-04-01 16:46:21 +08:00
sunxin c29e86fc9d [Feature] Support mtp overlap schedule (#7001) 2026-04-01 14:24:26 +08:00
YuBaoku c6f0c5c3a6 [CI] Optimize test execution with single-GPU parallelism (#7085)
* [CI] Optimize test execution with single-GPU parallelism and log collection

* remove export CUDA_VISIBLE_DEVICES

* fix path error

* fix log_* path and debug

* [CI] Optimize test execution with single-GPU parallelism and log collection
2026-04-01 14:18:40 +08:00
zhouchong 91c832f607 [Feature] Add logging parameters and error output to terminal (#7098) 2026-04-01 13:18:42 +08:00
jc af51fc46d6 [PD Disaggregation] Write the cache of preempted req to storage and refine PD Disaggregation (#7107)
* Write the cache of preempted req to storage

* up

* fix
2026-04-01 13:15:52 +08:00
luukunn 3651113ee5 [DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052)
* remove ENABLE_V1_DATA_PROCESSOR

* fix unit test

* fix unit test
2026-04-01 09:53:41 +08:00
qwes5s5 ee2b965f5f adjust config info (#7054) 2026-03-31 21:26:05 +08:00
Yonghua Li a3cc3aa777 [BugFix] reset exist tasks signal in clear_data (#7111)
* [BugFix] reset exist tasks signal in clear_data

* [Fix] fix stale exist tasks signal after weight update

* [Chore] downgrade detected new requests log to DEBUG level

* [fix] adjust continue place
2026-03-31 21:24:08 +08:00
周周周 fd44bb7cbf cpmmot (#7105)
Co-authored-by: “liuruian” <liuruian@baidu.com>
2026-03-31 16:13:44 +08:00
cloudforge1 5c5dc66aa7 [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 (#6731)
* [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充

* [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充
---------

Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-03-31 15:29:35 +08:00
YilongGuo dd61e7e421 [Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditionalGeneration (#7086)
Add clear_grpah_opt_backend method that delegates to the underlying model
to clear cuda graph optimization backend.

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
2026-03-31 13:48:25 +08:00