- speculate_limit_thinking_content_length: update current_base_step to
step_idx+1 (step_idx now records history count before current round);
remove incorrect step_idx decrement on accept_num truncation; mark
step_idx param as const.
- speculate_set_stop_value_multi_seqs: fix can_stop gate to use
step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx
formula (remove stale -accept_num offset); use <= condition so accept_idx
maps directly to the accepted token that ends the stop sequence; fix
accept_tokens index (remove -1).
- Update unit tests for speculate_set_stop_value_multi_seqs kernel.
* Update docs for release/2.5
* Update English docs for release/2.5
- Update README_EN.md: add v2.5 news entry, reformat v2.4 entry with release link
- Update docs/get_started/installation/nvidia_gpu.md:
- Docker image: 2.4.0 -> 2.5.0, notice now shows SM80/86/89/90 support
- paddlepaddle-gpu: 3.3.0 -> 3.3.1, add CUDA 12.9 alternatives
- fastdeploy-gpu: 2.4.0 -> 2.5.0, unified arch install with CUDA 12.9 option
- Update docs/zh/get_started/installation/nvidia_gpu.md:
- Fix remaining paddlepaddle-gpu==3.3.0 refs in sections 4&5 -> 3.3.1
Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/fa0be381-324e-4b0d-b7a6-e2c1fa12174f
* Clarify --extra-index-url usage in installation docs
Add note explaining that --extra-index-url is only for downloading
fastdeploy-gpu dependencies; fastdeploy-gpu itself must be installed
from the Paddle source specified by -i. Applied to both Chinese and
English nvidia_gpu.md installation guides.
Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/9fa8b3c9-7555-4eae-b9b9-026cddd7e74c
* Update nvidia_gpu.md
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
* merge matmul and add
* modify format
* using paddle.nn.functional.linear
* using _C_ops.linear
* using paddle.nn.functional.linear
* add FLAGS_use_legacy_linear env var in test case
* fix format
* add assert and remove env
* modify format
* using matmul for no bias
* modify accurate baseline
* Remove duplicate NICs from environment variables
* Update version for xvllm in download_dependencies.sh
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
* [BugFix] Set MC_MAX_MR_SIZE to avoid register hang (#7163)
* Set MC_MAX_MR_SIZE to avoid register hang
* up
* [fix] prevent requests from entering running state without a slot
* [fix] count abort set
* [fix] count preempted task in waiting list
---------
Co-authored-by: jc <52520497+juncaipeng@users.noreply.github.com>