YuBaoku
5c9fa43150
[Docs] Update Release Note ( #7302 )
2026-04-10 15:26:53 +08:00
yinwei
4aecaa70ba
[XPU][Docs] Update Release Note ( #7262 )
...
* update
* update docs
* update docs
* update commit
* update commit
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2026-04-10 15:22:16 +08:00
bukejiyu
14d46181b8
[Loader] add multi-thread model loading ( #6877 )
...
* multi-thread-loader
* fix ut
2026-04-09 23:40:15 -07:00
Jiang-Jia-Jun
e327673737
Update nvidia_gpu.md
2026-04-10 13:53:04 +08:00
YuBaoku
b7b4fe6a69
[Docs][CI] Fix prebuilt wheel installation and update Docs ( #7289 )
...
* [CI] Fix prebuilt wheel installation and update Docs
* [CI] Update Dockerfile.gpu to restrict SM80/86/89/90, CUDA 12.6 and Python 3.10
* Update nvidia_gpu.md
* Update nvidia_gpu.md
* Revise NVIDIA GPU installation instructions
Updated installation instructions for PaddlePaddle and FastDeploy to remove specific CUDA version mentions and clarify support for multiple GPU architectures.
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-04-10 10:31:12 +08:00
AIbin
fcaf614133
[Docs]add dsk-3.2 doc ( #7278 )
...
* add dsk-3.2 doc
2026-04-09 17:28:25 +08:00
Jiang-Jia-Jun
33682c6749
[Docs] Update docs for release/2.5 ( #7267 )
...
* Update docs for release/2.5
* Update English docs for release/2.5
- Update README_EN.md: add v2.5 news entry, reformat v2.4 entry with release link
- Update docs/get_started/installation/nvidia_gpu.md:
- Docker image: 2.4.0 -> 2.5.0, notice now shows SM80/86/89/90 support
- paddlepaddle-gpu: 3.3.0 -> 3.3.1, add CUDA 12.9 alternatives
- fastdeploy-gpu: 2.4.0 -> 2.5.0, unified arch install with CUDA 12.9 option
- Update docs/zh/get_started/installation/nvidia_gpu.md:
- Fix remaining paddlepaddle-gpu==3.3.0 refs in sections 4&5 -> 3.3.1
Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/fa0be381-324e-4b0d-b7a6-e2c1fa12174f
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Clarify --extra-index-url usage in installation docs
Add note explaining that --extra-index-url is only for downloading
fastdeploy-gpu dependencies; fastdeploy-gpu itself must be installed
from the Paddle source specified by -i. Applied to both Chinese and
English nvidia_gpu.md installation guides.
Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/9fa8b3c9-7555-4eae-b9b9-026cddd7e74c
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Update nvidia_gpu.md
---------
Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
2026-04-09 16:07:18 +08:00
yinwei
334b02c12b
[XPU][Docs] Update Release2.5 Note ( #7187 )
...
* update docs
* update
* update
2026-04-07 18:45:52 +08:00
lizexu123
5f612a348d
[BugFix] fix flashinfer-cutedsl moe nvfp4 ( #7120 )
...
* fix nvfp4
* fix
* add document
* fix nvfp4
* support eb5
* support bka
* support eb5
* support xpu
* fix
* fix
* add import cutedsl
* fix
* fix
* fix test
* fix H卡
* update document
* fix
* update document
* update document
* fix
2026-04-03 15:43:19 +08:00
Jingfeng Wu
3b564116d5
[Docs] Add docs for disaggregated deployment ( #6700 )
...
* add docs for disaggregated deployment
* pre-commit run for style check
* update docs
2026-04-01 19:27:09 +08:00
mouxin
6cae9b1f50
[Feature] Config eviction_duration ( #7125 )
...
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-04-01 16:46:21 +08:00
zhouchong
91c832f607
[Feature] Add logging parameters and error output to terminal ( #7098 )
2026-04-01 13:18:42 +08:00
qwes5s5
daa95244f7
abort requests ( #6992 )
2026-03-31 11:02:26 +08:00
chenjian
6727df8286
[Optimization] Optimize ttft for prefill pd ( #6680 )
...
* optimize ttft
* fix
* fix
* fix ci
* fix ci
* fix
* fix bug
* fix
* add comments
* fix ci
* fix
* fix ci
* fix format
* update according to review
* add comment
* fix
* fix format
2026-03-30 20:36:23 +08:00
jackyYang6
05f2d95729
[RL] Adapt async rollout checkpoint update flow ( #7042 )
...
* update checkpoint-transfer flow and control update_weights params
* test: add update_weights route validation
2026-03-30 19:19:34 +08:00
yzwu
8789329457
[Iluvatar] Support wi4a16 group_gemm ( #7078 )
2026-03-30 19:03:51 +08:00
Yonghua Li
a7f52c300d
[Feature] support v1 update/clear api for RL ( #6761 )
...
* [Feature] support v1 update/clear api for RL
* [fix] fix execute_model and add sleep/wakeup api
* [fix] fix mtp and key_prefix
* [chore] move _update_key_prefix to resume method
* [fix] make the interface safe to call multiple times
* [fix] fix some tiny bugs
* [chore] make small changes against pr review
* [docs] add docs for weight update
* [test] add some tests and update docs
* [style] fix code style check
* [test] fix ci
* [fix] fix stale control responses when control method timed out
* [chore] remove unused code
* [chore] fix code style
* [chore] optimize tags and key_prefix
* [test] fix ci
* [chore] fix code style
* [test] fix ci
* [fix] fix ep control
* [fix] fix ep control for engine cache queue
2026-03-25 19:18:46 +08:00
YuBaoku
aee293be0f
[CI] Optimize: add vl swap_test and remove useless code ( #7000 )
2026-03-25 11:33:56 +08:00
jackyYang6
634d23a38a
[Bugfix] Align thinking_budget behavior with ERNIE reasoning flow ( #6934 )
...
* [Bugfix] Align thinking_budget behavior with ERNIE reasoning flow
* [Docs] Fix thinking_budget markdown formatting
* [Test] Align ernie thinking budget test with process_request_dict
2026-03-23 14:15:55 +08:00
mouxin
96b0ecea6b
[Feature] Update Counter Release ( #6943 )
2026-03-20 10:51:37 +08:00
sunxin
33e01f22a8
[Feature][Sampling] Extend top-k_top-p sampling to all backends and unify greedy decoding with top_k=1 ( #6894 )
...
* update sampling
* fix
* fix
* fix mtp
* fix test
2026-03-19 01:43:10 -07:00
mouxin
b61731bb96
[Feature][Docs] Adjust prefill release & expose load metrics ( #6884 )
2026-03-17 15:23:13 +08:00
jc
04fde3b227
[PD Disaggregation] Prefill and decode support cache storage ( #6768 )
...
* Prefill and decode support cache storage
* up
* up
* update docs and refine mooncake store
* up
2026-03-16 14:44:49 +08:00
mouxin
49fe68a518
[Docs] Update Golang Router FAQ ( #6829 )
2026-03-13 15:48:36 +08:00
yzwu
901b38c936
[Iluvatar] Optimize decode group_gemm and Support cuda graph for ernie ( #6803 )
2026-03-12 19:21:17 +08:00
freeliuzc
cf7934a4b2
[Speculative Decoding] Unify Spec and non-spec branch ( #6685 )
...
* optimize spec-inference architecture
* delete debug log
* optimize spec_method usage && fix unit_test
* add claude unit-test skill
* fix some ugly bug
* enhance robustness and bounds check
* unify method & spec_method to method to avoid bug
* activate CI
* fix unit test
* Unify logprobs computation for naive and speculative decoding, fix CUDA kernel
* fix logprob bug && optimize verify kernel
* fix exist_decode() judge
2026-03-10 23:58:44 -07:00
gongweibao
be36133db6
Remove Python-only mode documentation from installation guides ( #6784 )
...
Remove BUILD_WHEEL=2 related sections from nvidia_gpu and
kunlunxin_xpu installation docs (both en and zh).
Co-authored-by: gongweibao <gognweibao@baidu.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 13:08:18 +08:00
mouxin
22d308a274
[Docs] Specify the default strategy ( #6728 )
...
* [Docs] Update the document
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-03-10 13:16:31 +08:00
周周周
3cc09418f1
support dsv3 use flashmla ( #6593 )
2026-03-03 11:09:43 +08:00
yzwu
6674131b0b
[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding ( #6553 )
2026-03-02 14:07:17 +08:00
YuBaoku
bb51829bd5
[CI] Fix tests and docs to resolve failure ( #6572 )
2026-03-01 12:33:01 +08:00
kevin
fa21fd95c4
[Docs] Update code overview documentation ( #6568 )
...
* [Docs] Update code overview documentation
- Add comprehensive FastDeploy code structure overview
- Include detailed module descriptions and development guides
- Add quick development guide for common tasks
- Update both English and Chinese versions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
* [Docs] Update code overview documentation format
- Convert file path links from [file](path) to `file` inline code format
- Add proper spacing for better readability in markdown tables
- Maintain consistent formatting across English and Chinese docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-28 16:37:01 +08:00
mouxin
049c807d86
[Docs] Update the document ( #6539 )
...
Co-authored-by: mouxin <mouxin@baidu.com >
2026-02-27 19:21:10 +08:00
周周周
1503443871
add dsv3 mixed deploy as EP16 TP8 ( #6525 )
2026-02-27 14:08:25 +08:00
gongweibao
2541462f7e
[Feature][Docs] Add Python-only quick install mode (BUILD_WHEEL=2) to build.sh ( #6503 )
...
* add pythononly func
* add
* add more feature
* add safe check
* add rsync check
* add
* add
* refine docs
* add installation
* add installation
2026-02-26 16:17:41 +08:00
AIbin
47bfd45bb6
[Docs]add deepseek model doc ( #6513 )
...
* add deepseek model doc
2026-02-26 14:08:19 +08:00
MingkunZhang
b56a4099c0
[Metax][Docs] update metax guidance documents ( #6515 )
2026-02-26 14:04:23 +08:00
GoldPancake
2178f2829b
[Speculative Decoding] Support suffix decoding ( #6403 )
...
* support suffix decoding
2026-02-26 11:42:05 +08:00
jackyYang6
a29ee57e15
[Feature] Support ThinkingBudget Logits processor to control thinking content length ( #6367 )
...
* feat: add thinking budget logits processor
* add unittest
* fix pre-commit
* add unittest
* docs: clarify operator-level vs logits processor usage and conflict guidance
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-02-25 14:17:09 +08:00
bukejiyu
5bfc0938e2
[BugFix] PD reorder fix and add ut ( #6375 )
2026-02-09 04:42:48 -08:00
chenjian
35c24f3f71
Revert "[Optimize] Optimize ttft for ep ( #6098 )" ( #6402 )
...
This reverts commit 90db0bdd0d .
2026-02-09 19:01:23 +08:00
luukunn
fd56d85346
add environment_variables ( #6385 )
2026-02-09 15:29:49 +08:00
chen
29a270bb38
[Docs] Add Doc for Online quantification ( #6399 )
...
* add doc for dynamic quant
* check
2026-02-08 22:09:18 -08:00
Jiang-Jia-Jun
18e79dd660
[Metrics] Support cpu-cache-block-num ( #6390 )
...
Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com >
2026-02-09 10:27:56 +08:00
chenjian
90db0bdd0d
[Optimize] Optimize ttft for ep ( #6098 )
...
* optimize ttft
* fix
* fix
* fix ci
* fix ci
* fix
* fix bug
* fix
* add comments
* fix ci
* fix
2026-02-04 15:03:29 +08:00
mouxin
6e96bd0bd2
[Feature] Fix counter release logic & update go-router download URL ( #6280 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Fix counter release logic
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update token counter logic and docs
* [Feature] Update token counter logic and docs
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-02-04 15:02:38 +08:00
Jiang-Jia-Jun
793dac0f9d
Modify Nightly Build installation commands for fastdeploy
...
Update the installation instructions for the Nightly Build of fastdeploy to use the cu126 index for both SM86/89 and SM80/90 architectures.
2026-02-03 20:24:27 +08:00
Jiang-Jia-Jun
829139a5e5
Fix Nightly build installation URLs for fastdeploy-gpu
...
Updated installation instructions for the latest Nightly build of fastdeploy-gpu to use the correct URLs for CUDA 12.6.
2026-02-03 20:24:19 +08:00
mouxin
506f1545cd
[Feature] Enhance Router with /v1/completions, docs, scripts, and version info ( #5966 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-01-30 10:28:48 +08:00
yuxuan
44b52701f6
[Feature] Support NVFP4 MoE on SM100 ( #6003 )
...
* fp4 dense
* [WIP] support nvfp4, dense part
* [wip] developing loading qwen model
* loading
* update
* dense fp4 OK, cudagraph error
* [WIP] moe forward part
* with flashinfer-backend
* qwen3_moe_fp4
* update
* support flashinfer-cutlass moe, qwen3-moe-fp4 OK
* support ernie4.5-fp4
* fix load error
* add some ut
* add docs
* fix CLA, test
* fix the apply() in ModelOptNvFp4FusedMoE
* fix CodeStyle
* del the PADDLE_COMPATIBLE_API
* fix broken url: nvidia_gpu.md
* fix docs
* fix token_ids
* fix CI in Hopper
* move flashinfer imports inside the function
* fix model_runner
Removed the logic for generating random padding IDs.
* Remove skip condition for CUDA version in nvfp4 test
* add test for nvfp4
* fix according to review
* Add Chinese translation link to NVFP4 documentation
* del flashinfer.py
* fix unittest
---------
Co-authored-by: zoooo0820 <zoooo0820@qq.com >
Co-authored-by: bukejiyu <395822456@qq.com >
2026-01-29 14:16:07 +08:00