jc
04fde3b227
[PD Disaggregation] Prefill and decode support cache storage ( #6768 )
...
* Prefill and decode support cache storage
* up
* up
* update docs and refine mooncake store
* up
2026-03-16 14:44:49 +08:00
mouxin
6e96bd0bd2
[Feature] Fix counter release logic & update go-router download URL ( #6280 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Fix counter release logic
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update token counter logic and docs
* [Feature] Update token counter logic and docs
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-02-04 15:02:38 +08:00
mouxin
506f1545cd
[Feature] Enhance Router with /v1/completions, docs, scripts, and version info ( #5966 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-01-30 10:28:48 +08:00
Cheng Yanfei
fbcccaa750
[Intel HPU] enable MoE EP for hpu ( #5855 )
...
* enable HPU MoE EP
* MoE intermediate_scale stack
* enable loader_v1 esp for tensor_wise_fp8 TP or EP
* modify activation_scale name
2026-01-15 13:08:00 +08:00
mouxin
0a92e96f20
[Feature] Add Golang-based Router for Request Scheduling and Load Balancing ( #5882 )
...
* [Feature] add golang router
* [Feature] add golang router
* [Feature] add golang router
* [Feature] add golang router
* [Feature] add golang router
* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing
* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing
* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing
* [Feature] Add Golang-based Router for Request Scheduling and Load Balancing
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-01-07 21:28:08 +08:00
jc
e9b25aa72f
[BugFix] Storage backend gets env params ( #5892 )
...
* Storage backend gets env params
* up
* up
* up
2026-01-06 14:14:17 +08:00
jc
8d384f9fd8
[PD Disaggregation] Update usage of pd disaggregation and data parallel ( #5742 )
...
* Update usage of pd disaggregation
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up dp docs
* up
* up
* up
* fix unittest
2026-01-05 17:51:29 +08:00
jc
e911ac2ce7
[BugFix] Refine the preparation of cpu and storage cache ( #5777 )
...
* Refine the preparation of cpu and storage cache
* fix error
* fix error
* up
* fix
* up docs
* fix unittest
* remove debug info
2026-01-05 10:13:30 +08:00
Juncai
412867fd99
[Feature] Support KV Cache Storage ( #5571 )
...
* Support Mooncake Store
* up
* up
* add op
* fix conflict
* fix error
* up for comments
* avoid thread lock
* up
* fix unittest
* fix unittest
* remove debug info
* consider tp_size > 1
* add default rdma_nics
* add utils
* up
* fix error
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-25 16:30:35 +08:00
fmiao2372
a8fce47195
[Intel HPU] enable kv cache scheduler v1 for hpu ( #5648 )
...
* [Intel HPU] enable kv cache scheduler v1 for hpu
* fix copilt comments
2025-12-19 12:03:39 +08:00
Yonghua Li
0c8c6369ed
[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports ( #5415 )
...
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports
* [fix] fix some bugs
* [fix] fix rdma port for cache manager/messager
* [fix] temporarily cancel port availability check to see if it can pass ci test
* [feat] simplify args for multi api server
* [fix] fix dp
* [fix] fix port for xpu
* [fix] add tests for ports post processing & fix ci
* [test] fix test_multi_api_server
* [fix] fix rdma_comm_ports args for multi_api_server
* [fix] fix test_common_engine
* [fix] fix test_cache_transfer_manager
* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER
* [fix] avoid api server from creating engine_args twice
* [fix] fix test_run_batch
* [fix] fix test_metrics
* [fix] fix splitwise connector init
* [test] add test_rdma_transfer and test_expert_service
* [fix] fix code syntax
* [fix] fix test_rdma_transfer and build wheel with rdma script
2025-12-17 15:50:42 +08:00
xiaolei373
a30b4da260
[Feature] Tracing: Fine-Grained Tracing for Request Latency Part1 ( #5458 )
2025-12-16 16:36:09 +08:00
Yonghua Li
f4119d51b4
[PD Disaggregation] support DP via v1 router and decouple DP and EP ( #5197 )
...
* [fix] support DP via v1 router and decouple DP and EP
* [fix] fix scripts
* [fix] reset model path
* [fix] dp use get_output_ep, fix router port type, update scripts
* [merge] merge with latest code
* [chore] remove some debug log
* [fix] fix code style check
* [fix] fix test_multi_api_server for log_dir name
* [chore] reduce logs
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-12-04 15:38:43 +08:00
fmiao2372
429dd2b1db
[Intel HPU] add example benchmark scripts for hpu ( #5304 )
...
* [Intel HPU] add example benchmark scripts for hpu
* Revise the code based on the copilot comments
* update code based on comments
* update ci ops version
2025-12-02 18:00:01 +08:00
K11OntheBoat
2e1680838f
[PD Disaggregation] Support PD deployment of DeepSeekv3. ( #5251 )
...
* Support deepseekv3 cache transfer for PD deploy
* clean some log info
---------
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-12-02 14:11:50 +08:00
Juncai
f9b0545a7f
[PD Disaggregation] [Refine] Refine splitwise deployment ( #5151 )
...
* Refine splitwise deployment
* up
2025-11-21 15:30:24 +08:00
Yonghua Li
43097a512a
[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol ( #5132 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix v1 scheduler profile run for append attention in prefill node
* [fix] skip send_signal if kv signal not inited for gpu and xpu
* [fix] extend fix to flash_attn & mla_attn
* [fix] fix v1 pd run in ipc transfer protocol
* [ci] add test for v1 pd profile run using ipc transfer protocol
* [style] fix code style check
* [style] fix code style again
* [fix] fix profile run
* [update] remove --num-gpu-blocks-override in example script
* [chore] rename forward_meta is_profiling to is_dummy_or_profile_run
2025-11-20 21:39:22 +08:00
Juncai
36822fa49c
[PD Disaggregation] remove splitwise deployment on single node and refine the code ( #4891 )
...
* remove splitwise deployment on single node and refine the code
* up
* up
* up
* add test
* up
2025-11-14 09:56:53 +08:00
Juncai
08ca0f6aea
[Feature] [PD] add simple router and refine splitwise deployment ( #4709 )
...
* add simple router and refine splitwise deployment
* fix
2025-11-06 14:56:02 +08:00
jiangjiajun
684703fd72
[LLM] First commit the llm deployment code
2025-06-09 19:20:15 +08:00
Jules
4f4f2e14bf
fix Windows text encoding issue causing infinite loop
2025-02-14 18:40:00 +08:00
Yutian Rao
a300abde8c
Update README.md
...
修改文字描述OenVINO=》OpenVINO
2024-10-29 15:22:49 +08:00
DefTruth
12bb44e0de
[Bug Fix] fix build xpu encrypt & auth image scripts ( #2133 )
...
* [patchelf] fix patchelf error for inference xpu
* [serving] add xpu dockerfile and support fd server
* [serving] add xpu dockerfile and support fd server
* [Serving] support XPU + Tritron
* [Serving] support XPU + Tritron
* [Dockerfile] update xpu tritron docker file -> paddle 0.0.0
* [Dockerfile] update xpu tritron docker file -> paddle 0.0.0
* [Dockerfile] update xpu tritron docker file -> paddle 0.0.0
* [Dockerfile] add comments for xpu tritron dockerfile
* [Doruntime] fix xpu infer error
* [Doruntime] fix xpu infer error
* [XPU] update xpu dockerfile
* add xpu triton server docs
* add xpu triton server docs
* add xpu triton server docs
* add xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* [XPU] Update XPU L3 Cache setting docs
* [XPU] Add Encryption and AUTH support for XPU Server
* [XPU] Add Encryption and AUTH support for XPU Server
* [Bug Fix] fix paddle reader error
* [Serving] Support XPU encrypt & auth server
* [Serving] Support XPU encrypt & auth server
* [Serving] Support XPU encrypt & auth server
* [Serving] Support XPU encrypt & auth server
* [Triton] switch TAG 22.12 -> TAG 21.10wq
* update xpu auth server script
* [Bug Fix] fix build xpu encrypt & auth image scripts
2023-07-24 21:00:05 +08:00
jack xu
821adb387e
[Bug Fix] Fixed the issue with the incorrect path to FastDeploy.cmake in CMakeLists.txt file. ( #2082 )
...
[Bug Fix] fixed CMakeLists.txt FastDeploy.cmake path
2023-07-04 13:04:19 +08:00
zengshao0622
79a3587339
[Model] Add Paddle3D CenterPoint model ( #2078 )
...
* add centerpoint
* update for review comments
2023-07-03 13:39:16 +08:00
YuBinglei
5f9e8b6e08
[Bug Fix] Re-Fix OCR Serving bug. #1516 ( #2011 )
...
see https://github.com/PaddlePaddle/FastDeploy/pull/1516
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-06-09 13:44:55 +08:00
Zheng-Bicheng
8d357814e8
[Backend] Add pybind & PaddleDetection example for TVM ( #1998 )
...
* update
* update
* Update infer_ppyoloe_demo.cc
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-06-04 13:26:47 +08:00
DefTruth
434b48dda5
[Serving] Support FastDeploy XPU Triton Server ( #1994 )
...
* [patchelf] fix patchelf error for inference xpu
* [serving] add xpu dockerfile and support fd server
* [serving] add xpu dockerfile and support fd server
* [Serving] support XPU + Tritron
* [Serving] support XPU + Tritron
* [Dockerfile] update xpu tritron docker file -> paddle 0.0.0
* [Dockerfile] update xpu tritron docker file -> paddle 0.0.0
* [Dockerfile] update xpu tritron docker file -> paddle 0.0.0
* [Dockerfile] add comments for xpu tritron dockerfile
* [Doruntime] fix xpu infer error
* [Doruntime] fix xpu infer error
* [XPU] update xpu dockerfile
* add xpu triton server docs
* add xpu triton server docs
* add xpu triton server docs
* add xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
* update xpu triton server docs
2023-05-29 14:38:25 +08:00
Zheng-Bicheng
643730bf5f
[Hackathon 181] Add TVM support for FastDeploy on macOS ( #1969 )
...
* update for tvm backend
* update third_party
* update third_party
* update
* update
* update
* update
* update
* update
* update
* update
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-05-25 19:59:02 +08:00
CoolCola
e3b285c762
[Model] Support Paddle3D PETR v2 model ( #1863 )
...
* Support PETR v2
* make petrv2 precision equal with the origin repo
* delete extra func
* modify review problem
* delete visualize
* Update README_CN.md
* Update README.md
* Update README_CN.md
* fix build problem
* delete external variable and function
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-05-19 10:45:36 +08:00
Qianhe Chen
09ec386e8d
[Bug Fix] Fix speech and silence state transition in VAD ( #1937 )
...
* Fix speech and silence state transition
* Fix typo
* Fix speech and silence state transition
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-05-16 18:50:04 +08:00
DefTruth
77cb9db6da
[Model] Support PP-ShiTuV2 models for PaddleClas ( #1900 )
...
* [cmake] add faiss.cmake -> pp-shituv2
* [PP-ShiTuV2] Support PP-ShituV2-Det model
* [PP-ShiTuV2] Support PP-ShiTuV2-Det model
* [PP-ShiTuV2] Add PPShiTuV2Recognizer c++&python support
* [PP-ShiTuV2] Add PPShiTuV2Recognizer c++&python support
* [Bug Fix] fix ppshitu_pybind error
* [benchmark] Add ppshituv2-det c++ benchmark
* [examples] Add PP-ShiTuV2 det & rec examples
* [vision] Update vision classification result
* [Bug Fix] fix trt shapes setting errors
2023-05-08 14:04:09 +08:00
seyosum
df8dd3e3ac
【Hackthon_4th 180】Support HORIZON BPU Backend for FastDeploy ( #1822 )
...
* add horizon backend and PPYOLOE examples
* 更改horizon头文件编码规范
* 更改horizon头文件编码规范
* 更改horizon头文件编码规范
* 增加horizon packages下载及自动安装
* Add UseHorizonNPUBackend Method
* 删除编译FD SDK后多余的头文件,同时更改部分规范
* Update horizon.md
* Update horizon.md
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-05-06 16:10:37 +08:00
DefTruth
6d0261e9e4
[Model] Support PP-StructureV2-Layout model ( #1867 )
...
* [Model] init pp-structurev2-layout code
* [Model] init pp-structurev2-layout code
* [Model] init pp-structurev2-layout code
* [Model] add structurev2_layout_preprocessor
* [PP-StructureV2] add postprocessor and layout detector class
* [PP-StructureV2] add postprocessor and layout detector class
* [PP-StructureV2] add postprocessor and layout detector class
* [PP-StructureV2] add postprocessor and layout detector class
* [PP-StructureV2] add postprocessor and layout detector class
* [pybind] add pp-structurev2-layout model pybind
* [pybind] add pp-structurev2-layout model pybind
* [Bug Fix] fixed code style
* [examples] add pp-structurev2-layout c++ examples
* [PP-StructureV2] add python example and docs
* [benchmark] add pp-structurev2-layout benchmark support
2023-05-05 13:05:58 +08:00
thunder95
2c5fd91a7f
[Hackthon_4th 242] Support en_ppstructure_mobile_v2.0_SLANet ( #1816 )
...
* first draft
* update api name
* fix bug
* fix bug and
* fix bug in c api
* fix bug in c_api
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-04-27 10:45:14 +08:00
thunder95
51be3fea78
[Hackthon_4th 177] Support PP-YOLOE-R with BM1684 ( #1809 )
...
* first draft
* add robx iou
* add benchmark for ppyoloe_r
* remove trash code
* fix bugs
* add pybind nms rotated option
* add missing head file
* fix bug
* fix bug2
* fix shape bug
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-04-21 10:48:05 +08:00
yeliang2258
a509dd8ec1
[Model] Add Paddle3D smoke model ( #1766 )
...
* add smoke model
* add 3d vis
* update code
* update doc
* mv paddle3d from detection to perception
* update result for velocity
* update code for CI
* add set input data for TRT backend
* add serving support for smoke model
* update code
* update code
* update code
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-04-14 16:30:56 +08:00
yeliang2258
e2f5a9ce66
[Model] Add picodet for RV1126 and A311D ( #1549 )
...
* add infer for picodet
* update code
* update lite lib
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-04-10 22:04:45 +08:00
hjyp
cc4bbf2163
[PaddlePaddle Hackathon4 No.185] Add PaddleDetection Models Deployment Java Examples ( #1782 )
...
* add java examples
* fix detail
* fix pre-config
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-04-10 21:23:44 +08:00
Zheng-Bicheng
109d1046ae
[Model] add function for setting anchor rknpu2 ( #1728 )
...
* add function for setting anchor rknpu2
add more demo for rknpu2
fixed md error
* Update config.h
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-04-04 20:33:06 +08:00
wanziyu
95c977c638
[PaddlePaddle Hackathon4 No.184] Add PaddleDetection Models Deployment Rust Examples ( #1717 )
...
* [PaddlePaddle Hackathon4 No.186] Add PaddleDetection Models Deployment Go Examples
Signed-off-by: wanziyu <ziyuwan@zju.edu.cn >
* Fix YOLOv8 Deployment Go Example
Signed-off-by: wanziyu <ziyuwan@zju.edu.cn >
* [Hackathon4 No.184] Add PaddleDetection Models Deployment Rust Examples
Signed-off-by: wanziyu <ziyuwan@zju.edu.cn >
* Add main and cargo files in examples
Signed-off-by: wanziyu <ziyuwan@zju.edu.cn >
---------
Signed-off-by: wanziyu <ziyuwan@zju.edu.cn >
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-04-03 11:19:28 +08:00
Yi-sir
9e20dab0d6
[Example] Merge Download Paddle Model, Paddle->ONNX->MLIR->BModel ( #1643 )
...
* fix infer.py and README
* [Example] Merge Download Paddle Model, Paddle->Onnx->Mlir->Bmodel and
inference into infer.py. Modify README.md
* modify pp_liteseg sophgo infer.py and README.md
* fix PPOCR,PPYOLOE,PICODET,LITESEG sophgo infer.py and README.md
* fix memory overflow problem while inferring with sophgo backend
* fix memory overflow problem while inferring with sophgo backend
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
Co-authored-by: xuyizhou <yizhou.xu@sophgo.com >
2023-03-31 15:08:01 +08:00
wanziyu
b1d2903b93
[PaddlePaddle Hackathon4 No.186] Add PaddleDetection Models Deployment Go Examples ( #1648 )
...
* [PaddlePaddle Hackathon4 No.186] Add PaddleDetection Models Deployment Go Examples
Signed-off-by: wanziyu <ziyuwan@zju.edu.cn >
* Fix YOLOv8 Deployment Go Example
Signed-off-by: wanziyu <ziyuwan@zju.edu.cn >
---------
Signed-off-by: wanziyu <ziyuwan@zju.edu.cn >
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-03-28 20:30:03 +08:00
wangguoya
c61a07712e
fix bug for kunlunxin run sd demo for uing fp16 ( #1680 )
...
* modify sd infer.py for using paddle_kunlunxin_fp16
* Update infer.py
* [fix bug] fix bug sd in demo infer.py for kunlunxin using fp16
2023-03-27 14:04:21 +08:00
yunyaoXYY
f36f9324de
[Docs] Pick PPOCR fastdeploy docs from PaddleOCR ( #1534 )
...
* Pick PPOCR fastdeploy docs from PaddleOCR
* improve ppocr
* improve readme
* remove old PP-OCRv2 and PP-OCRv3 folfers
* rename kunlun to kunlunxin
* improve readme
* improve readme
* improve readme
---------
Co-authored-by: Jason <jiangjiajun@baidu.com >
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-03-23 13:11:19 +08:00
yunyaoXYY
c91e99b5f5
[Docs] Pick paddleclas fastdeploy docs from PaddleClas ( #1654 )
...
* Adjust folders structures in paddleclas
* remove useless files
* Update sophgo
* improve readme
2023-03-23 13:06:09 +08:00
DefTruth
af18e597d0
[Docs] rename ppseg kunlun docs -> kunlunxin ( #1662 )
...
* [Docs] rename ppseg kunlun -> kunlunxin
* [Docs] rename ppseg fastdeploy kunlun docs -> kunlunxin
2023-03-20 19:46:18 +08:00
DefTruth
5b143219ce
[Docs] Pick seg fastdeploy docs from PaddleSeg ( #1482 )
...
* [Docs] Pick seg fastdeploy docs from PaddleSeg
* [Docs] update seg docs
* [Docs] Add c&csharp examples for seg
* [Docs] Add c&csharp examples for seg
* [Doc] Update paddleseg README.md
* Update README.md
2023-03-17 11:22:46 +08:00
Jason
6343b0db47
[Build] Support build with source code of Paddle2ONNX ( #1559 )
...
* Add notes for tensors
* Optimize some apis
* move some warnings
* Support build with Paddle2ONNX
* Add protobuf support
* Fix compile on mac
* add clearn package script
* Add paddle2onnx code
* remove submodule
* Add onnx ocde
* remove softlink
* add onnx code
* fix error
* Add cmake file
* fix patchelf
* update paddle2onnx
* Delete .gitmodules
---------
Co-authored-by: PaddleCI <paddle_ci@example.com >
Co-authored-by: pangyoki <pangyoki@126.com >
Co-authored-by: jiangjiajun <jiangjiajun@baidu.lcom >
2023-03-17 10:03:22 +08:00
Zheng-Bicheng
d14db2629d
[Example] Move SOLOv2 jetson example -> cpp ( #1600 )
...
* move solov2
* move solov2
---------
Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2023-03-16 22:04:50 +08:00