Commit Graph

96 Commits

Author SHA1 Message Date
jc dad3be366a Mooncake storage register local buffer by chunk (#7541) 2026-04-22 10:47:19 +08:00
jc cf939b4511 [BugFix] Remove ipc lock to avoid nan (#7312)
* Remove ipc lock to avoid nan

* up
2026-04-12 13:58:19 +08:00
jc 44ef7b6758 Set MC_MAX_MR_SIZE to avoid register hang (#7162) 2026-04-03 10:51:27 +08:00
jc bd48640b4b Write the cache of preempted req to storage (#7113) 2026-04-01 13:16:12 +08:00
jc 971fc7c15e Add lock to avoid generating nan (#7047) 2026-03-30 14:50:38 +08:00
Yonghua Li 35034f91fa [Cherry-Pick] [Feature] support v1 update/clear api for RL (#6761) (#6974)
* [Feature] support v1 update/clear api for RL

* [fix] fix stale control responses when control method timed out

* [chore] remove unused code

* [chore] optimize tags and key_prefix

* [test] fix ci

* [chore] fix code style

* [fix] fix ep control

* [fix] fix ep control for engine cache queue
2026-03-25 19:18:35 +08:00
jc 408404bfab Set MC_TCP_BIND_ADDRESS for mooncake store (#6783) 2026-03-11 16:56:46 +08:00
YuBaoku 6260c77ea6 [BugFix][KVCache] Add inter-process lock to fix NaN error under DP+EP (#6724) (#6769)
* [BugFix] Support  to fix NaN bug in EP

* Optimze notion for all the funs

* Fix potential lock contention failure issues

* Update fastdeploy/inter_communicator/ipc_signal.py



* Update envs.py

* Update default value for USE_KVCACHE_LOCK

Change default value of USE_KVCACHE_LOCK from 1 to 0.

* Update worker_process.py

* Fix suffix wrong

* Update test_prefix_cache_manager.py

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-11 09:54:01 +08:00
jc bcfd3168ce [BugFix] Fix error in dynamic c8 cache (#6544) (#6692)
* [BugFix] Fix error in dynamic c8 cache

* fix device id
2026-03-06 17:25:55 +08:00
Yonghua Li 5c9017bdde [Cherry-Pick] [BugFix] fix prefix tree updating timeout (#6615) (#6616) 2026-03-03 16:54:03 +08:00
kevin 603714bde9 [BugFix][Cherry-Pick] Add safety checks in recycle_gpu_blocks to prevent block allocation errors(#6531) (#6530)
* fix mtp acceptance rate decline cp

* [BugFix] Add safety checks in recycle_gpu_blocks to prevent block allocation errors

- Check prefix tree status before recycling GPU blocks
- Validate gpu_block_ids is a list
- Add overflow check to prevent free block count exceeding total blocks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [BugFix] Fix AttributeError in recycle_gpu_blocks when prefix_tree_status_signal not initialized

- Add hasattr check before accessing prefix_tree_status_signal
- The signal is only initialized in launch_cache_messager, not in __init__
- Fixes CI test failure in test_prefix_cache_manager.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [BugFix] Reset prefix cache when model weights are updating

- Call self.reset() before setting status to NORMAL in UPDATING state
- Ensure cache consistency when model weights change
- Consistent with CLEARING state handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 13:12:15 +08:00
Yonghua Li 36388104b5 [Cherry-Pick] [BugFix] fixup for cache transfer manager init failed when using block_wise_fp8 and no storage backend (#6516) (#6564)
* [BugFix] fixup for cache transfer manager init failed when using block_wise_fp8 and no storage backend

* [fix] fix ci
2026-03-01 13:43:19 +08:00
Copilot 319bf1fd06 [Cherry-Pick][BugFix][RL] Set GPU flags for paddle in cache transfer manager (#6534) (#6550)
* Initial plan

* [Cherry-Pick] Set GPU flags for paddle in cache transfer manager (#6534)

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-02-28 20:12:10 +08:00
Yonghua Li d13d71c60a [fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend (#6517) 2026-02-28 11:19:20 +08:00
jc 1815b99d31 [BugFix] Fix storage_backend_type comparison bug in cache_transfer_manager.py (#6522)
Co-authored-by: root <root@tjzj-inf-sci-k8s-hzz1-h62ni7-2178.tjzj.baidu.com>
2026-02-26 19:42:38 +08:00
Yonghua Li 4092d39fca [Cherry-Pick] [BugFix] fix num_cpu_blocks computation (#6438) (#6473)
* [BugFix] fix num_cpu_blocks computation

* [fix] fix syntax and log

* [fix] pre-commit

* [fix] use getattr

* [fix] ci test
2026-02-13 15:30:13 +08:00
CSWYF3634076 ec128068b7 [Others] Exit to ensure no residual processes (cpu cache & dp) (#6377)
* [Others] good exit single dp

* [Others] good exit cpu cache dp>1

* [Others] good exit cpu cache dp>1 unittest
2026-02-09 20:38:38 +08:00
Jiang-Jia-Jun 18e79dd660 [Metrics] Support cpu-cache-block-num (#6390)
Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com>
2026-02-09 10:27:56 +08:00
Yonghua Li 5ac5ecd0b0 [BugFix] fix cache transfer tasks failure after cache cleared (#6202)
* [fix] fix cache transfer tasks failure after cache cleared

* [fix] fix submit_task

* [fix] fix cache manager hang when clearing prefix cache

* [fix] fix list_proxy has no clear method

* [fix] fix barrier

* [fix] add barrier0

* [fix] add cache_task_is_paused_signal

* [fix] fix condition

* [fix] fix cache transfer  sync and delay prefix cache tree clearing

* [fix] fix typo

* [chore] polish code

* [fix] revert only rank0 write kv_cache_status_signal

* [fix] fix thread pool and prefix cache manager hang

* [fix] add timeout for task_swapping_event

* [fix] tolerate prefix cache manager error while prefix tree is cleared

* [chore] add more log

* [fix] fix test_prefix_cache_manager

* [fix] fix prefix_cache_status_signal usage
2026-02-08 15:33:56 +08:00
jc d6b3c722c1 [KVCache] Storage cache supports c8 model (#6298)
* Refine cache transfer manager
* Storage cache supports c8 model
2026-02-06 12:01:17 +08:00
Moonchild1227 39dc4b0c2e [Feature] [KVCache] support file_store kv cache backend (#6188)
* fix(examples): comment out stop.sh to avoid error when script is missing

* feat: add file_store support for cache manager

* [fix] fix multi gpu transfer

* [fix] fix global kvcache transfer

* [Feature] [KVCache] support file_store kv cache backend

* chore: update FileStore according to PR comments

* fix: remove comments

* fix: add swap_cache_layout for file store

* fix: remove rank key

* fix: Switch KV cache storage to pure file mode

* Temporarily disable support for Tensor types

* fix: remove args --kvcache_file_path & add envs FILE_BACKEND_STORAGE_DIR

* fixx: Simplify cache_transfer_manager.py

* fix: fix syntax bug

* fix: Simplify file_store.py

* fix: Use the key directly as the filename

* fix: Simplify set()

* fix: Simplify cache_transfer_manager.py & file_store.py

* fix: Only support load to cpu buffer

* feat: add FileStore backend for cache transfer

* fix: guard zmq import
2026-02-03 14:37:58 +08:00
chenjian af1b1d2d56 [Feature] Support report token index by attention store (#6285)
* [Feature] Support report token index by attention store

* fix format
2026-02-02 10:41:11 +08:00
chenjian 292bab7e6d [BugFix] Fix bug for enable output caching (#6226)
* [BugFix] Fix bug for enable output caching

* fix

* Fix

* fix

* fix ci
2026-01-30 10:55:36 +08:00
jc b1698a79cb [RL] add version to the key of cache storage && refine raising error (#6160)
* Waiting for cache transfer manager inited

* up

* up

* up

* up

* up

* fix according comments

* fix unittest

* fix

* fix unittest

* fix error

* pass storage_backend to worker
2026-01-27 10:47:46 +08:00
Yonghua Li 833d00e2d7 [BugFix] move cache creation back to cache transfer process and adapt clear/update (#6144)
* [fix] move cache creation back to cache transfer process

* [fix] fix clear cache

* [chore] change some log level

* [fix] fix clear cache

* [fix] fix clear cache for blockwisefp8 and mtp

* [fix] fix c8

* [fix] fix clear_mtp_cache args

* [chore] update cache_transfer_manager

* [fix] fix update mtp cache
2026-01-24 21:59:13 +08:00
Yonghua Li 8d27a523e7 [Feature] [KVCache] support attention_store kv cache backend (#5823)
* [feat] support attention_store kv cache backend

* [fix] fix codestyle

* [chore] optimize log

* [fix] fix write storage task

* [fix] fix read storage

* [fix] fix code conflict after merge develop

* [fix] fix cache bytes and read task token ids

* [chore] add model for cache transfer manager

* [chore] add some log

* [chore] remove launched_cache_manager_signal

* [fix] fix write_back_storage_task match_block_num condition

* [fix] fix swap_cost_time

* [ci] fix ci

* Update fastdeploy/engine/sched/resource_manager_v1.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/cache_manager/cache_transfer_manager.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-22 21:01:23 +08:00
qwes5s5 b2a2e11551 [Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320)
* request disconnect

* request disconnect

* fix bug

* fix bug--amend

---------

Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>
2026-01-16 11:46:13 +08:00
Daci e10b51b8c6 [Feature] get_output_kv_signal blocking read mode & send_first_token (#5836)
* get_output_kv_signal blocking read mode

* send first token before recycle

* xpu get_output_kv_signal blocking read mode

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-15 14:11:03 +08:00
Yonghua Li 456637002d [BugFix] fix cache transfer manager updating/clearing (#5930)
* [fix] fix cache transfer manager updating/clearing

* [fix] fix code style

* [fix] fix config

* [fix] fix engine client

* [fix] let worker update kv cache status signal

* [fix] update worker process

* [fix] fix clear/update for case if comm group is shutdown

* [fix] update dynamic weight manager

* [fix] fix port

* [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting
2026-01-13 05:09:29 -08:00
Yonghua Li 60ee72f682 [BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935)
* [fix] fix rdma script and add more error log for multi api server

* [fix] log

* [fix] fix test_multi_api_server

* [fix] fix multi api server port check

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-12 10:38:52 +08:00
kevin 2d2b156252 [BugFix] fix dyc8 cache bug (#5958)
* fix dyc8 cache bug

* update code
2026-01-08 19:25:47 -08:00
kevin eabd01cd21 [BugFix] fix eb5 prefix bug (#5879)
* fix eb5 prefix bug

* update ci test

* update code

* update code

* update code

* update code

* update code

* update code

* update code
2026-01-06 23:50:39 -08:00
kevin a76e8ae40c [Feature] support rdma pd dy-c8 (#5788)
* add rdma pd dy-c8

* update code
2026-01-07 14:55:25 +08:00
Yonghua Li 9445fbe054 [KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled (#5871)
* [fix] temporarily forbid cpu cache in update/clear api

* [fix] stop launching cache transfer manager unless hierarchical cache is enabled

* [fix] fix no attr hierarchical cache

* [fix] fix ci

* [fix] fix test_prefix_cache_manager.py
2026-01-06 14:27:47 +08:00
jc e9b25aa72f [BugFix] Storage backend gets env params (#5892)
* Storage backend gets env params

* up

* up

* up
2026-01-06 14:14:17 +08:00
jc e911ac2ce7 [BugFix] Refine the preparation of cpu and storage cache (#5777)
* Refine the preparation of cpu and storage cache

* fix error

* fix error

* up

* fix

* up docs

* fix unittest

* remove debug info
2026-01-05 10:13:30 +08:00
jc 95257c1dbd [Feature] RDMACommunicator send key and value scale (#5737)
* RDMACommunicator send key and value scale
---------

Co-authored-by: kevin <chengyf112@gmail.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-05 10:04:24 +08:00
kevin 52dc9a7b85 [BugFix] skip mm revert (#5848)
* skip mm revert

* update code

* update test
2026-01-04 14:25:45 +08:00
MingkunZhang f732d7d2ad [Metax] adapt prefix caching & cpu swap (#5844)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2025-12-31 17:02:48 +08:00
周周周 7ae13b2326 [PD Disaggregation]remove unsed para in RDMACommManager (#5814) 2025-12-30 11:38:30 +08:00
kevin 5538dda3c8 [Feature] pd support dy-c8 ipc (#5750)
* pd support dy-c8 ipc

* update code

* support v0

* update code
2025-12-25 21:22:34 +08:00
Juncai 412867fd99 [Feature] Support KV Cache Storage (#5571)
* Support Mooncake Store

* up

* up

* add op

* fix conflict

* fix error

* up for comments

* avoid thread lock

* up

* fix unittest

* fix unittest

* remove debug info

* consider tp_size > 1

* add default rdma_nics

* add utils

* up

* fix error

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 16:30:35 +08:00
Yonghua Li 0c8c6369ed [Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415)
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports

* [fix] fix some bugs

* [fix] fix rdma port for cache manager/messager

* [fix] temporarily cancel port availability check to see if it can pass ci test

* [feat] simplify args for multi api server

* [fix] fix dp

* [fix] fix port for xpu

* [fix] add tests for ports post processing & fix ci

* [test] fix test_multi_api_server

* [fix] fix rdma_comm_ports args for multi_api_server

* [fix] fix test_common_engine

* [fix] fix test_cache_transfer_manager

* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER

* [fix] avoid api server from creating engine_args twice

* [fix] fix test_run_batch

* [fix] fix test_metrics

* [fix] fix splitwise connector init

* [test] add test_rdma_transfer and test_expert_service

* [fix] fix code syntax

* [fix] fix test_rdma_transfer and build wheel with rdma script
2025-12-17 15:50:42 +08:00
kevin c9b47f90ce [BugFix] fix cpu prefix cache bug (#5544)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix_dy_c8_bug

* add block_num check

* fix test case

* update ci case
2025-12-16 14:21:42 +08:00
kevin 954a145d57 [Optimization] support mm prefill batch (#5313)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mm prefill batch

* update code

* update code

* update code

* update code

* fix encoder cache bug

* update code

* update code

* fix bug

* fix paddle ocr bug

* fix xpu bug

* update code
2025-12-11 22:21:14 +08:00
Juncai 83ea9646f9 [PD Disaggregation] Unify the disaggregation info and the pd communication (#5438)
* Unify the disaggregation info and the pd communication

* up

* up

* fix

* fix conflict

* fix unittest
2025-12-09 14:44:59 +08:00
Daci 2f208db4e9 [Feature] Multimodal Model P / D Separation (#5323)
* RouterArgs port str -> int

* fix race condition [is_fetching] causing multiple fetch requests

* bugfix: Delete duplicate input_ids tensor creation

* mm pd splitwise json -> pickle5; multimodal_inputs only pos id;
debuglog f to %s

* fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ...

* update cr

* Apply suggestions from code review

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* pre-commit fix

* rm multimodal_inputs deepcopy & fix rdma_cache_transfer.py tpsize=0

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-09 10:47:42 +08:00
Yonghua Li f4119d51b4 [PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197)
* [fix] support DP via v1 router and decouple DP and EP

* [fix] fix scripts

* [fix] reset model path

* [fix] dp use get_output_ep, fix router port type, update scripts

* [merge] merge with latest code

* [chore] remove some debug log

* [fix] fix code style check

* [fix] fix test_multi_api_server for log_dir name

* [chore] reduce logs

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-04 15:38:43 +08:00
K11OntheBoat 2e1680838f [PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251)
* Support deepseekv3 cache transfer for PD deploy

* clean some log info

---------

Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-02 14:11:50 +08:00
Juncai 0925d44f18 [PD Disaggregation] support different tp_size for prefill and decode (#5296)
* up

* up

* up

* fix
2025-12-01 17:50:20 +08:00