FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
jc	dad3be366a	Mooncake storage register local buffer by chunk (#7541 )	2026-04-22 10:47:19 +08:00
jc	cf939b4511	[BugFix] Remove ipc lock to avoid nan (#7312 ) * Remove ipc lock to avoid nan * up	2026-04-12 13:58:19 +08:00
jc	44ef7b6758	Set MC_MAX_MR_SIZE to avoid register hang (#7162 )	2026-04-03 10:51:27 +08:00
jc	bd48640b4b	Write the cache of preempted req to storage (#7113 )	2026-04-01 13:16:12 +08:00
jc	971fc7c15e	Add lock to avoid generating nan (#7047 )	2026-03-30 14:50:38 +08:00
Yonghua Li	35034f91fa	[Cherry-Pick] [Feature] support v1 update/clear api for RL (#6761 ) (#6974 ) * [Feature] support v1 update/clear api for RL * [fix] fix stale control responses when control method timed out * [chore] remove unused code * [chore] optimize tags and key_prefix * [test] fix ci * [chore] fix code style * [fix] fix ep control * [fix] fix ep control for engine cache queue	2026-03-25 19:18:35 +08:00
jc	408404bfab	Set MC_TCP_BIND_ADDRESS for mooncake store (#6783 )	2026-03-11 16:56:46 +08:00
YuBaoku	6260c77ea6	[BugFix][KVCache] Add inter-process lock to fix NaN error under DP+EP (#6724 ) (#6769 ) * [BugFix] Support to fix NaN bug in EP * Optimze notion for all the funs * Fix potential lock contention failure issues * Update fastdeploy/inter_communicator/ipc_signal.py * Update envs.py * Update default value for USE_KVCACHE_LOCK Change default value of USE_KVCACHE_LOCK from 1 to 0. * Update worker_process.py * Fix suffix wrong * Update test_prefix_cache_manager.py --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-11 09:54:01 +08:00
jc	bcfd3168ce	[BugFix] Fix error in dynamic c8 cache (#6544 ) (#6692 ) * [BugFix] Fix error in dynamic c8 cache * fix device id	2026-03-06 17:25:55 +08:00
Yonghua Li	5c9017bdde	[Cherry-Pick] [BugFix] fix prefix tree updating timeout (#6615 ) (#6616 )	2026-03-03 16:54:03 +08:00
kevin	603714bde9	[BugFix][Cherry-Pick] Add safety checks in recycle_gpu_blocks to prevent block allocation errors(#6531 ) (#6530 ) * fix mtp acceptance rate decline cp * [BugFix] Add safety checks in recycle_gpu_blocks to prevent block allocation errors - Check prefix tree status before recycling GPU blocks - Validate gpu_block_ids is a list - Add overflow check to prevent free block count exceeding total blocks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix] Fix AttributeError in recycle_gpu_blocks when prefix_tree_status_signal not initialized - Add hasattr check before accessing prefix_tree_status_signal - The signal is only initialized in launch_cache_messager, not in __init__ - Fixes CI test failure in test_prefix_cache_manager.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix] Reset prefix cache when model weights are updating - Call self.reset() before setting status to NORMAL in UPDATING state - Ensure cache consistency when model weights change - Consistent with CLEARING state handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 13:12:15 +08:00
Yonghua Li	36388104b5	[Cherry-Pick] [BugFix] fixup for cache transfer manager init failed when using block_wise_fp8 and no storage backend (#6516 ) (#6564 ) * [BugFix] fixup for cache transfer manager init failed when using block_wise_fp8 and no storage backend * [fix] fix ci	2026-03-01 13:43:19 +08:00
Copilot	319bf1fd06	[Cherry-Pick][BugFix][RL] Set GPU flags for paddle in cache transfer manager (#6534 ) (#6550 ) * Initial plan * [Cherry-Pick] Set GPU flags for paddle in cache transfer manager (#6534) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-02-28 20:12:10 +08:00
Yonghua Li	d13d71c60a	[fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend (#6517 )	2026-02-28 11:19:20 +08:00
jc	1815b99d31	[BugFix] Fix storage_backend_type comparison bug in cache_transfer_manager.py (#6522 ) Co-authored-by: root <root@tjzj-inf-sci-k8s-hzz1-h62ni7-2178.tjzj.baidu.com>	2026-02-26 19:42:38 +08:00
Yonghua Li	4092d39fca	[Cherry-Pick] [BugFix] fix num_cpu_blocks computation (#6438 ) (#6473 ) * [BugFix] fix num_cpu_blocks computation * [fix] fix syntax and log * [fix] pre-commit * [fix] use getattr * [fix] ci test	2026-02-13 15:30:13 +08:00
CSWYF3634076	ec128068b7	[Others] Exit to ensure no residual processes (cpu cache & dp) (#6377 ) * [Others] good exit single dp * [Others] good exit cpu cache dp>1 * [Others] good exit cpu cache dp>1 unittest	2026-02-09 20:38:38 +08:00
Jiang-Jia-Jun	18e79dd660	[Metrics] Support cpu-cache-block-num (#6390 ) Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com>	2026-02-09 10:27:56 +08:00
Yonghua Li	5ac5ecd0b0	[BugFix] fix cache transfer tasks failure after cache cleared (#6202 ) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task * [fix] fix cache manager hang when clearing prefix cache * [fix] fix list_proxy has no clear method * [fix] fix barrier * [fix] add barrier0 * [fix] add cache_task_is_paused_signal * [fix] fix condition * [fix] fix cache transfer sync and delay prefix cache tree clearing * [fix] fix typo * [chore] polish code * [fix] revert only rank0 write kv_cache_status_signal * [fix] fix thread pool and prefix cache manager hang * [fix] add timeout for task_swapping_event * [fix] tolerate prefix cache manager error while prefix tree is cleared * [chore] add more log * [fix] fix test_prefix_cache_manager * [fix] fix prefix_cache_status_signal usage	2026-02-08 15:33:56 +08:00
jc	d6b3c722c1	[KVCache] Storage cache supports c8 model (#6298 ) * Refine cache transfer manager * Storage cache supports c8 model	2026-02-06 12:01:17 +08:00
Moonchild1227	39dc4b0c2e	[Feature] [KVCache] support file_store kv cache backend (#6188 ) * fix(examples): comment out stop.sh to avoid error when script is missing * feat: add file_store support for cache manager * [fix] fix multi gpu transfer * [fix] fix global kvcache transfer * [Feature] [KVCache] support file_store kv cache backend * chore: update FileStore according to PR comments * fix: remove comments * fix: add swap_cache_layout for file store * fix: remove rank key * fix: Switch KV cache storage to pure file mode * Temporarily disable support for Tensor types * fix: remove args --kvcache_file_path & add envs FILE_BACKEND_STORAGE_DIR * fixx: Simplify cache_transfer_manager.py * fix: fix syntax bug * fix: Simplify file_store.py * fix: Use the key directly as the filename * fix: Simplify set() * fix: Simplify cache_transfer_manager.py & file_store.py * fix: Only support load to cpu buffer * feat: add FileStore backend for cache transfer * fix: guard zmq import	2026-02-03 14:37:58 +08:00
chenjian	af1b1d2d56	[Feature] Support report token index by attention store (#6285 ) * [Feature] Support report token index by attention store * fix format	2026-02-02 10:41:11 +08:00
chenjian	292bab7e6d	[BugFix] Fix bug for enable output caching (#6226 ) * [BugFix] Fix bug for enable output caching * fix * Fix * fix * fix ci	2026-01-30 10:55:36 +08:00
jc	b1698a79cb	[RL] add version to the key of cache storage && refine raising error (#6160 ) * Waiting for cache transfer manager inited * up * up * up * up * up * fix according comments * fix unittest * fix * fix unittest * fix error * pass storage_backend to worker	2026-01-27 10:47:46 +08:00
Yonghua Li	833d00e2d7	[BugFix] move cache creation back to cache transfer process and adapt clear/update (#6144 ) * [fix] move cache creation back to cache transfer process * [fix] fix clear cache * [chore] change some log level * [fix] fix clear cache * [fix] fix clear cache for blockwisefp8 and mtp * [fix] fix c8 * [fix] fix clear_mtp_cache args * [chore] update cache_transfer_manager * [fix] fix update mtp cache	2026-01-24 21:59:13 +08:00
Yonghua Li	8d27a523e7	[Feature] [KVCache] support attention_store kv cache backend (#5823 ) * [feat] support attention_store kv cache backend * [fix] fix codestyle * [chore] optimize log * [fix] fix write storage task * [fix] fix read storage * [fix] fix code conflict after merge develop * [fix] fix cache bytes and read task token ids * [chore] add model for cache transfer manager * [chore] add some log * [chore] remove launched_cache_manager_signal * [fix] fix write_back_storage_task match_block_num condition * [fix] fix swap_cost_time * [ci] fix ci * Update fastdeploy/engine/sched/resource_manager_v1.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/cache_transfer_manager.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-22 21:01:23 +08:00
qwes5s5	b2a2e11551	[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320 ) * request disconnect * request disconnect * fix bug * fix bug--amend --------- Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>	2026-01-16 11:46:13 +08:00
Daci	e10b51b8c6	[Feature] get_output_kv_signal blocking read mode & send_first_token (#5836 ) * get_output_kv_signal blocking read mode * send first token before recycle * xpu get_output_kv_signal blocking read mode --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-15 14:11:03 +08:00
Yonghua Li	456637002d	[BugFix] fix cache transfer manager updating/clearing (#5930 ) * [fix] fix cache transfer manager updating/clearing * [fix] fix code style * [fix] fix config * [fix] fix engine client * [fix] let worker update kv cache status signal * [fix] update worker process * [fix] fix clear/update for case if comm group is shutdown * [fix] update dynamic weight manager * [fix] fix port * [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting	2026-01-13 05:09:29 -08:00
Yonghua Li	60ee72f682	[BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935 ) * [fix] fix rdma script and add more error log for multi api server * [fix] log * [fix] fix test_multi_api_server * [fix] fix multi api server port check --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-12 10:38:52 +08:00
kevin	2d2b156252	[BugFix] fix dyc8 cache bug (#5958 ) * fix dyc8 cache bug * update code	2026-01-08 19:25:47 -08:00
kevin	eabd01cd21	[BugFix] fix eb5 prefix bug (#5879 ) * fix eb5 prefix bug * update ci test * update code * update code * update code * update code * update code * update code * update code	2026-01-06 23:50:39 -08:00
kevin	a76e8ae40c	[Feature] support rdma pd dy-c8 (#5788 ) * add rdma pd dy-c8 * update code	2026-01-07 14:55:25 +08:00
Yonghua Li	9445fbe054	[KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled (#5871 ) * [fix] temporarily forbid cpu cache in update/clear api * [fix] stop launching cache transfer manager unless hierarchical cache is enabled * [fix] fix no attr hierarchical cache * [fix] fix ci * [fix] fix test_prefix_cache_manager.py	2026-01-06 14:27:47 +08:00
jc	e9b25aa72f	[BugFix] Storage backend gets env params (#5892 ) * Storage backend gets env params * up * up * up	2026-01-06 14:14:17 +08:00
jc	e911ac2ce7	[BugFix] Refine the preparation of cpu and storage cache (#5777 ) * Refine the preparation of cpu and storage cache * fix error * fix error * up * fix * up docs * fix unittest * remove debug info	2026-01-05 10:13:30 +08:00
jc	95257c1dbd	[Feature] RDMACommunicator send key and value scale (#5737 ) * RDMACommunicator send key and value scale --------- Co-authored-by: kevin <chengyf112@gmail.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-05 10:04:24 +08:00
kevin	52dc9a7b85	[BugFix] skip mm revert (#5848 ) * skip mm revert * update code * update test	2026-01-04 14:25:45 +08:00
MingkunZhang	f732d7d2ad	[Metax] adapt prefix caching & cpu swap (#5844 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2025-12-31 17:02:48 +08:00
周周周	7ae13b2326	[PD Disaggregation]remove unsed para in RDMACommManager (#5814 )	2025-12-30 11:38:30 +08:00
kevin	5538dda3c8	[Feature] pd support dy-c8 ipc (#5750 ) * pd support dy-c8 ipc * update code * support v0 * update code	2025-12-25 21:22:34 +08:00
Juncai	412867fd99	[Feature] Support KV Cache Storage (#5571 ) * Support Mooncake Store * up * up * add op * fix conflict * fix error * up for comments * avoid thread lock * up * fix unittest * fix unittest * remove debug info * consider tp_size > 1 * add default rdma_nics * add utils * up * fix error --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-25 16:30:35 +08:00
Yonghua Li	0c8c6369ed	[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 ) * [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports * [fix] fix some bugs * [fix] fix rdma port for cache manager/messager * [fix] temporarily cancel port availability check to see if it can pass ci test * [feat] simplify args for multi api server * [fix] fix dp * [fix] fix port for xpu * [fix] add tests for ports post processing & fix ci * [test] fix test_multi_api_server * [fix] fix rdma_comm_ports args for multi_api_server * [fix] fix test_common_engine * [fix] fix test_cache_transfer_manager * [chore] automatically setting FD_ENABLE_MULTI_API_SERVER * [fix] avoid api server from creating engine_args twice * [fix] fix test_run_batch * [fix] fix test_metrics * [fix] fix splitwise connector init * [test] add test_rdma_transfer and test_expert_service * [fix] fix code syntax * [fix] fix test_rdma_transfer and build wheel with rdma script	2025-12-17 15:50:42 +08:00
kevin	c9b47f90ce	[BugFix] fix cpu prefix cache bug (#5544 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix_dy_c8_bug * add block_num check * fix test case * update ci case	2025-12-16 14:21:42 +08:00
kevin	954a145d57	[Optimization] support mm prefill batch (#5313 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support mm prefill batch * update code * update code * update code * update code * fix encoder cache bug * update code * update code * fix bug * fix paddle ocr bug * fix xpu bug * update code	2025-12-11 22:21:14 +08:00
Juncai	83ea9646f9	[PD Disaggregation] Unify the disaggregation info and the pd communication (#5438 ) * Unify the disaggregation info and the pd communication * up * up * fix * fix conflict * fix unittest	2025-12-09 14:44:59 +08:00
Daci	2f208db4e9	[Feature] Multimodal Model P / D Separation (#5323 ) * RouterArgs port str -> int * fix race condition [is_fetching] causing multiple fetch requests * bugfix: Delete duplicate input_ids tensor creation * mm pd splitwise json -> pickle5; multimodal_inputs only pos id; debuglog f to %s * fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ... * update cr * Apply suggestions from code review Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * pre-commit fix * rm multimodal_inputs deepcopy & fix rdma_cache_transfer.py tpsize=0 --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-09 10:47:42 +08:00
Yonghua Li	f4119d51b4	[PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197 ) * [fix] support DP via v1 router and decouple DP and EP * [fix] fix scripts * [fix] reset model path * [fix] dp use get_output_ep, fix router port type, update scripts * [merge] merge with latest code * [chore] remove some debug log * [fix] fix code style check * [fix] fix test_multi_api_server for log_dir name * [chore] reduce logs * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-04 15:38:43 +08:00
K11OntheBoat	2e1680838f	[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251 ) * Support deepseekv3 cache transfer for PD deploy * clean some log info --------- Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-12-02 14:11:50 +08:00
Juncai	0925d44f18	[PD Disaggregation] support different tp_size for prefill and decode (#5296 ) * up * up * up * fix	2025-12-01 17:50:20 +08:00

1 2

96 Commits