Commit Graph

28 Commits

Author SHA1 Message Date
Dangweichong 3c9fd818e3 [BugFix] Fix RDMA initializes failed (#7025) 2026-03-26 17:45:39 +08:00
jc 04fde3b227 [PD Disaggregation] Prefill and decode support cache storage (#6768)
* Prefill and decode support cache storage

* up

* up

* update docs and refine mooncake store

* up
2026-03-16 14:44:49 +08:00
jc 0466c7e8a8 Set MC_TCP_BIND_ADDRESS for mooncake store (#6782) 2026-03-11 16:56:39 +08:00
RichardWooSJTU fe0b3a90ee [PD Disaggregation] Fix cache messager performance problem & add kv transfer benchmark tool (#6434)
* fix cache messager performance problem

* dispatch param type
2026-03-02 14:28:14 +08:00
jc d6b3c722c1 [KVCache] Storage cache supports c8 model (#6298)
* Refine cache transfer manager
* Storage cache supports c8 model
2026-02-06 12:01:17 +08:00
Moonchild1227 39dc4b0c2e [Feature] [KVCache] support file_store kv cache backend (#6188)
* fix(examples): comment out stop.sh to avoid error when script is missing

* feat: add file_store support for cache manager

* [fix] fix multi gpu transfer

* [fix] fix global kvcache transfer

* [Feature] [KVCache] support file_store kv cache backend

* chore: update FileStore according to PR comments

* fix: remove comments

* fix: add swap_cache_layout for file store

* fix: remove rank key

* fix: Switch KV cache storage to pure file mode

* Temporarily disable support for Tensor types

* fix: remove args --kvcache_file_path & add envs FILE_BACKEND_STORAGE_DIR

* fixx: Simplify cache_transfer_manager.py

* fix: fix syntax bug

* fix: Simplify file_store.py

* fix: Use the key directly as the filename

* fix: Simplify set()

* fix: Simplify cache_transfer_manager.py & file_store.py

* fix: Only support load to cpu buffer

* feat: add FileStore backend for cache transfer

* fix: guard zmq import
2026-02-03 14:37:58 +08:00
chenjian af1b1d2d56 [Feature] Support report token index by attention store (#6285)
* [Feature] Support report token index by attention store

* fix format
2026-02-02 10:41:11 +08:00
jc b1698a79cb [RL] add version to the key of cache storage && refine raising error (#6160)
* Waiting for cache transfer manager inited

* up

* up

* up

* up

* up

* fix according comments

* fix unittest

* fix

* fix unittest

* fix error

* pass storage_backend to worker
2026-01-27 10:47:46 +08:00
Yonghua Li 8d27a523e7 [Feature] [KVCache] support attention_store kv cache backend (#5823)
* [feat] support attention_store kv cache backend

* [fix] fix codestyle

* [chore] optimize log

* [fix] fix write storage task

* [fix] fix read storage

* [fix] fix code conflict after merge develop

* [fix] fix cache bytes and read task token ids

* [chore] add model for cache transfer manager

* [chore] add some log

* [chore] remove launched_cache_manager_signal

* [fix] fix write_back_storage_task match_block_num condition

* [fix] fix swap_cost_time

* [ci] fix ci

* Update fastdeploy/engine/sched/resource_manager_v1.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/cache_manager/cache_transfer_manager.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-22 21:01:23 +08:00
Yonghua Li 60ee72f682 [BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935)
* [fix] fix rdma script and add more error log for multi api server

* [fix] log

* [fix] fix test_multi_api_server

* [fix] fix multi api server port check

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-12 10:38:52 +08:00
kevin a76e8ae40c [Feature] support rdma pd dy-c8 (#5788)
* add rdma pd dy-c8

* update code
2026-01-07 14:55:25 +08:00
jc e9b25aa72f [BugFix] Storage backend gets env params (#5892)
* Storage backend gets env params

* up

* up

* up
2026-01-06 14:14:17 +08:00
jc 95257c1dbd [Feature] RDMACommunicator send key and value scale (#5737)
* RDMACommunicator send key and value scale
---------

Co-authored-by: kevin <chengyf112@gmail.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-05 10:04:24 +08:00
周周周 7ae13b2326 [PD Disaggregation]remove unsed para in RDMACommManager (#5814) 2025-12-30 11:38:30 +08:00
kevin 5538dda3c8 [Feature] pd support dy-c8 ipc (#5750)
* pd support dy-c8 ipc

* update code

* support v0

* update code
2025-12-25 21:22:34 +08:00
Juncai 412867fd99 [Feature] Support KV Cache Storage (#5571)
* Support Mooncake Store

* up

* up

* add op

* fix conflict

* fix error

* up for comments

* avoid thread lock

* up

* fix unittest

* fix unittest

* remove debug info

* consider tp_size > 1

* add default rdma_nics

* add utils

* up

* fix error

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 16:30:35 +08:00
Yonghua Li 0c8c6369ed [Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415)
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports

* [fix] fix some bugs

* [fix] fix rdma port for cache manager/messager

* [fix] temporarily cancel port availability check to see if it can pass ci test

* [feat] simplify args for multi api server

* [fix] fix dp

* [fix] fix port for xpu

* [fix] add tests for ports post processing & fix ci

* [test] fix test_multi_api_server

* [fix] fix rdma_comm_ports args for multi_api_server

* [fix] fix test_common_engine

* [fix] fix test_cache_transfer_manager

* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER

* [fix] avoid api server from creating engine_args twice

* [fix] fix test_run_batch

* [fix] fix test_metrics

* [fix] fix splitwise connector init

* [test] add test_rdma_transfer and test_expert_service

* [fix] fix code syntax

* [fix] fix test_rdma_transfer and build wheel with rdma script
2025-12-17 15:50:42 +08:00
Daci 2f208db4e9 [Feature] Multimodal Model P / D Separation (#5323)
* RouterArgs port str -> int

* fix race condition [is_fetching] causing multiple fetch requests

* bugfix: Delete duplicate input_ids tensor creation

* mm pd splitwise json -> pickle5; multimodal_inputs only pos id;
debuglog f to %s

* fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ...

* update cr

* Apply suggestions from code review

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* pre-commit fix

* rm multimodal_inputs deepcopy & fix rdma_cache_transfer.py tpsize=0

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-09 10:47:42 +08:00
K11OntheBoat 2e1680838f [PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251)
* Support deepseekv3 cache transfer for PD deploy

* clean some log info

---------

Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-02 14:11:50 +08:00
Juncai 0925d44f18 [PD Disaggregation] support different tp_size for prefill and decode (#5296)
* up

* up

* up

* fix
2025-12-01 17:50:20 +08:00
ddchenhao66 e70e2279ce [PD Disaggregation][XPU] Add XPU support for PD disaggregation (#5113)
* [XPU] xpu support PD disaggregation

* [XPU] fix the issue of cache KV transfer process startup failure on non-zero XPU cards

* [XPU] xpu support PD disaggregation in v1 scheduler

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-11-21 14:09:01 +08:00
Juncai 36822fa49c [PD Disaggregation] remove splitwise deployment on single node and refine the code (#4891)
* remove splitwise deployment on single node and refine the code

* up

* up

* up

* add test

* up
2025-11-14 09:56:53 +08:00
zhupengyang 3a6883ac1a c++ code format (#4527) 2025-10-22 17:59:50 +08:00
chenjian 918ccdb123 [Feature] Support pd ep deployment with yiyan adapter (#4029)
* [Feature] Support mixed deployment with yiyan adapter in release2.2

* fix metrics

* add unit test

* add unit test

* add unit test

* Support pd ep deployment with yiyan adapter

* Support pd ep deployment with yiyan adapter

* refactor cache messager

* support scheduler v1 in PD

* suppport pd v1 + chunk prefill

* suppport pd v1 + chunk prefill

* add eplb

* support eplb

* support eplb

* support eplb

* support v1

* fix

* fix

* fix bug

* remove eplb support

* support prefix cache in P

* fix bug

* fix bug

* support one stop in V1

* fix bug

* fix ci

* fix ci

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-09-22 16:41:38 +08:00
chenjian fe17410f9c [BUG] Fix bug for pd in fd (#3034)
* Fix bug for pd in fd

* Fix bug for pd in fd

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 20:17:27 +08:00
Zhida Hu 3f8a41e68c [*] fix the memory leak when modify qp to rts failed (#3051)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-30 19:49:07 +08:00
Zero Rains 25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun 92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00