FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
kevin	7707be8384	[Feature][KVCache] Implement Cache Manager V1 with GPU + CPU Cache Support (1/n) (#7097 ) * [Feature][KVCache] Support cache manager v1 architecture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update cache manager and related modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update cache_manager and related modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add node to evictable set in complete_swap_to_device When a node transitions from SWAP_TO_DEVICE to DEVICE via complete_swap_to_device, it was not being added to the _evictable_device set. This caused nodes with ref_count=0 to become "orphaned" - not appearing in any evictable set despite having cache_status=DEVICE. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: update cache manager v1 and related modules - Add new cache_manager.py with cache management functionality - Add radix_tree.py for prefix caching - Update block_pool.py and metadata.py - Update request.py and resource_manager_v1.py for scheduling - Update gpu_model_runner.py for GPU model execution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(cache): add cache controller v1 implementation - Add CacheController class for cache management - Update config.py with cache related configurations - Refactor gpu_model_runner.py for improved cache handling * feat(cache_manager): update cache manager v1 * fix(cache_manager): 修复 swap_cache H2D/D2H 方向的 block_ids 逻辑并清理 ForwardMeta ## Motivation 修复 swap_cache_optimized.cu 中 H2D 方向时 src/dst block_ids 使用错误的问题，并清理 ForwardMeta 中已废弃的 cache_controller 字段。 ## Modifications - fix: swap_cache_optimized.cu 中根据 D2H 模板参数正确选取 src/dst block_ids，修复 H2D 方向 src/dst 倒置 bug（同时修复 SwapCachePerLayerImpl 和 SwapCacheAllLayersBatchImpl） - refactor: cache_manager/v1/__init__.py 将 LayerSwapTimeoutError 导入从 cache_controller 改为 cache_utils（正确来源） - refactor: ForwardMeta 移除废弃的 cache_controller 字段 - refactor: gpu_model_runner.py 移除对应的 cache_controller 赋值语句 - test: 新增 tests/cache_manager/v1/test_swap_cache_ops.py 单元测试 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cache_manager): refactor cache manager v1 and optimize swap ops ## Motivation 对 cache manager v1 进行重构和优化，精简代码结构，提升可维护性。 ## Modifications - 重构 transfer_manager.py，大幅精简代码逻辑 - 优化 swap_cache_optimized.cu GPU 算子实现 - 调整 cache_manager.py、cache_controller.py 逻辑，修复 free_device_blocks 方法缺失问题 - 更新 block_pool.py、cache_utils.py、metadata.py、radix_tree.py - 精简 gpu_model_runner.py、forward_meta.py、attention.py 中相关调用 - 更新对应单元测试（test_cache_controller、test_swap_cache_ops、test_transfer_manager） - 调整 config.py 中相关配置项 * [KVCache][MTP] 支持 cache_manager_v1 下的 MTP KV Cache 初始化及多模态 hash ## Motivation 在 enable_cache_manager_v1 路径下，MTP（speculative decode）的 KV Cache 需要由 CacheController 统一管理，以复用 swap/transfer 能力，同时修复多模态场景下 block hash 未携带 multimodal extra_keys 的问题。 ## Modifications - `cache_controller.py` - 新增 `initialize_mtp_kv_cache`：通过 CacheController 初始化 MTP KV Cache，并将其注册到 cache_kvs_map，使 transfer_manager 自动覆盖 MTP 层 - `initialize_host_cache` 中的 num_layers 改为包含 MTP 额外 cache 层数，保证 Host Cache 也为 MTP 分配足够空间 - `_free_gpu_cache` 改名为 `free_gpu_cache`（对外可调用） - `cache_utils.py` - 新增 `get_block_hash_extra_keys`：提取单个 block 内的多模态 hash 信息，对齐 PrefixCacheManager 的 multimodal extra_keys 逻辑 - `get_request_block_hasher` 中在 hash_block_tokens 时携带 extra_keys，修复多模态场景 prefix cache 命中率不准的问题 - `spec_decode/mtp.py` - `update_mtp_block_num` 新增 `skip_cache_init` 参数，避免 v1 cache manager 路径下重复初始化 MTP KV Cache - `gpu_model_runner.py` - `initialize_kv_cache(v1)` 路径：在主模型 cache 初始化后，调用 `cache_controller.initialize_mtp_kv_cache` 完成 MTP cache 创建 - `clear_cache` / `wakeup` / `reset` 等路径：respect `enable_cache_manager_v1` 标志，跳过重复的 proposer.initialize_kv_cache 调用 ## Usage or Command ```bash # 启动支持 MTP + cache_manager_v1 的推理服务（示例） bash run.sh ``` * fix(cache_manager): multi-GPU fix, mm hash boundary fix, and remove batch ops 1. Fix CuPy stream/event creation for multi-GPU: wrap all stream operations with cp.cuda.Device(device_id) context to ensure streams/events are bound to the correct device, preventing cross-device errors in multi-GPU setups. 2. Remove cudaSetDevice from SwapCacheAllLayers (handled by cupy context now). 3. Remove swap_cache_all_layers_batch op: simplified the implementation by removing the batch upload variant; all-layer transfers now use the standard swap_cache_all_layers with cupy device context. 4. Fix mm hash boundary comparison in get_block_hash_extra_keys: change strict less-than (<) to less-than-or-equal (<=) so that multimodal items ending exactly at block start are correctly excluded. 5. Extract config fields to KVCacheBase: model_config, cache_config, quant_config, parallel_config are now set in the base class __init__ to avoid duplication in CacheController and CacheManager subclasses. 6. Translate metadata.py docstrings from Chinese to English for broader contributor accessibility. 7. Add test_cache_utils.py: comprehensive unit tests for get_block_hash_extra_keys covering all boundary and overlap scenarios. 8. Expand test suite: test_request.py cache fields tests, test_radix_tree.py backup candidate tests, test_transfer_manager.py and test_cache_manager.py multi-GPU and concurrent operation tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [BugFix][KVCache] fix List import and move write_policy normalization to CacheManager ## Motivation 修复两处问题： 1. `fastdeploy/engine/request.py` 中 `List` 未导入导致 pre-commit F821 报错 2. `write_policy` 归一化逻辑（`write_through` → `write_through_selective`）不应放在 `FDConfig`，移至 `CacheManager.__init__` 中，使其只影响 Cache Manager V1 的内部逻辑 ## Modifications - `fastdeploy/engine/request.py`: 在 `typing` 导入中补充 `List`，删除重复的 `CacheSwapMetadata` TYPE_CHECKING 导入，修复 F821/F811 - `fastdeploy/config.py`: 删除 `write_policy` 归一化逻辑 - `fastdeploy/cache_manager/v1/cache_manager.py`: 将归一化逻辑移入 `CacheManager.__init__` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [BugFix][KVCache] fix pre-commit code style issues ## Motivation 修复 CI pre-commit 代码风格检查失败问题。 ## Modifications - `fastdeploy/engine/common_engine.py`: black 格式化 - `fastdeploy/worker/worker_process.py`: black 格式化 + isort 修复 - `fastdeploy/cache_manager/v1/storage/__init__.py`: isort 修复 - `fastdeploy/worker/gpu_worker.py`: isort 修复 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [Feature][KVCache] update cache_manager_v1 modules ## Motivation 更新 Cache Manager V1 相关模块，完善版权信息、改进模块结构与可维护性。 ## Modifications - `fastdeploy/cache_manager/v1/` 系列模块：补充版权 header，优化代码结构 - `fastdeploy/config.py`：配置项更新 - `fastdeploy/engine/sched/resource_manager_v1.py`：调度相关更新 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [Feature][KVCache] add BatchRequest.from_tasks and refactor worker task parsing ## Motivation 将 worker_process 中重复的 task 解析逻辑收敛到 BatchRequest，减少代码冗余，提升可维护性。 ## Modifications - `fastdeploy/engine/request.py`：新增 `BatchRequest.from_tasks()` 类方法，统一将 task_queue 任务分类为推理请求和控制请求 - `fastdeploy/worker/worker_process.py`：使用 `BatchRequest.from_tasks()` 替代内联解析逻辑，并修复重复的 control_reqs 处理块 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [Feature][KVCache] add NUMA affinity for host cache and skip swap cache tests ## Motivation 优化 Host cache 内存分配的 NUMA 亲和性，减少跨 NUMA 访问延迟；同时跳过 swap cache ops 测试（当前环境不支持）。 ## Modifications - `fastdeploy/cache_manager/v1/cache_controller.py`： - 新增 `_get_numa_node_for_gpu()` 方法，通过 nvidia-smi 或 sysfs 获取 GPU 对应的 NUMA 节点 - 新增 `_bind_to_closest_numa_node()` 方法，绑定当前线程到 GPU 最近的 NUMA 节点 - 在 `initialize_host_cache()` 中调用 NUMA 绑定，优化 H2D 传输性能 - `tests/cache_manager/v1/test_swap_cache_ops.py`：跳过所有测试类（`TestSwapCacheAllLayersCorrectness`、`TestSwapCacheAllLayersPerformance`、`TestSwapCacheRandomBlockIndices`） Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [BugFix][KVCache] fix unittest failures for cache_manager_v1 三个单测因接口变更或 Mock 方式问题导致失败，需修复。 - tests/distributed/chunked_moe.py：`setup_model_runner` 使用 `__new__` 跳过 `__init__`，补加 `enable_cache_manager_v1 = False`，修复 `AttributeError` - tests/engine/test_resource_manager.py：`PrefixCacheManager` 为局部导入，`patch` 路径改为定义位置 `fastdeploy.cache_manager.prefix_cache_manager.PrefixCacheManager` - tests/v1/test_resource_manager_v1.py：`_trigger_preempt` 第四参数已由 `list` 改为 `BatchRequest`，更新测试传参和断言 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [BugFix][KVCache] remove debug logging code ## Modifications - fastdeploy/engine/request.py：删除调试用 logger 及 prompt_hashes 中的 debug 日志 - fastdeploy/worker/worker_process.py：删除 __main__ 中的调试 import 和 print 语句 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [BugFix][KVCache] fix cupy device id caching and pickle for _match_result ## Motivation 修复两个 bug： 1. `transfer_manager.py` 中每次调用 `cp.cuda.runtime.getDevice()` 存在隐患，应在初始化时缓存为实例变量，保证后续操作使用一致的设备 ID。 2. `request.py` 的 `__getstate__` 未跳过 `_match_result`，该字段包含 BlockNode 树的父子循环引用，pickle 时会触发 `RecursionError`；同时补充 `__setstate__` 确保 unpickle 后字段恢复为安全默认值。 ## Modifications - `transfer_manager.py`：初始化时调用 `cp.cuda.runtime.getDevice()` 并缓存到 `self._cupy_device_id`，后续 `with cp.cuda.Device(...)` 和日志均使用该缓存值。 - `request.py`： - `__getstate__` 中将 `_match_result` 加入跳过集合 `_SKIP_KEYS`，避免循环引用导致 pickle 失败。 - 新增 `__setstate__`，unpickle 后将 `_block_hasher` 和 `_match_result` 恢复为 `None`。 ## Usage or Command * fix(test): fix unit test errors for _trigger_preempt and wakeup with MTP - Use BatchRequest instead of list in test_trigger_preempt_records_tasks - Add missing enable_cache_manager_v1 attr in TestSleepWakeupBehavior._make_runner Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [BugFix][KVCache] fix gpu_free_block_list returning wrong block IDs ## Motivation `gpu_free_block_list` 的兼容 property 中误用了 `list(range(N))`，将 `available_blocks()` 的返回值当作整数传给 `range()`，导致返回 `[0, 1, ..., N-1]` 的假列表，而非真实的空闲 block ID。 ## Modifications - `cache_manager/v1/cache_manager.py`：将 `list(range(self._device_pool.available_blocks()))` 改为 `list(self._device_pool.available_blocks())` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [BugFix][KVCache] 修复 gpu_free_block_list 返回 int 导致 TypeError ## Motivation gpu_free_block_list 属性中调用 BlockPool.available_blocks()，该方法返回 int（空闲块数量），用 list() 包装 int 会触发 TypeError: 'int' object is not iterable。 ## Modifications 将 list(self._device_pool.available_blocks()) 改为 list(self._device_pool._free_blocks)，直接返回空闲块索引列表。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [KVCache][CacheManager] 适配 V1 CacheManager 的 pause/sleep/free_cache 操作 ## Motivation V1 CacheManager 引入了新的 reset_cache() 接口，pause 和 sleep 操作需要适配，同时 free_cache 需要支持可选的 clear_storage 参数。 ## Modifications - cache_controller.py: free_cache 新增 clear_storage 参数（默认 False），仅当 clear_storage=True 时才调用 _clear_storage()，避免不必要的 storage 清空 - common_engine.py: pause 和 sleep 操作中，当 ENABLE_V1_KVCACHE_MANAGER 时使用 cache_manager.reset_cache() 替代旧的 reset() 和 pause_transfer 逻辑 - gpu_model_runner.py: sleep 时仅在非 V1 cache manager 下执行 MTP cache 清除 ## Usage or Command # 启动服务（V1 CacheManager） python -m fastdeploy.entrypoints.openai.api_server \ --enable-v1-kvcache-manager \ ... * [BugFix][KVCache] fix missing enable_cache_manager_v1 in test mocks and remove unused select_blocks_for_backup - Remove unused `select_blocks_for_backup` method from radix_tree.py - Fix `match_prefix` default param `skip_storage=True` and log order in cache_manager.py - Sync test_gpu_model_runner.py with upstream/develop (add TestInsertTasksV1SplitwiseSuffix) - Add `enable_cache_manager_v1=False` to all mock runners to fix AttributeError in CI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [BugFix][KVCache] simplify _free_blocks in ResourceManagerV1 for non-v1 path Remove redundant prefix_caching branch in else path; always call recycle_gpu_blocks with full block_tables for non-cache-manager-v1 case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [KVCache][Optimization][BugFix] fix and optimize block_pool, cache_manager, transfer_manager, request ## Motivation 修复 cache_manager v1 中若干代码质量问题，提升性能并消除潜在的类型不一致 Bug。 ## Modifications 1. block_pool.py：`BlockPool.allocate` 将逐个 pop 循环替换为切片 + 批量 set.update，消除 Python 循环开销，O(n) → O(k)（C 层批量操作） 2. cache_manager.py：`match_prefix` 在 prefix caching 关闭时提前 return 前写入空 `MatchResult()`，避免调用方解引用 `_match_result=None` 崩溃 3. transfer_manager.py：`_build_device_layer_indices` 在 `_cache_kvs_map` 为空时也重置四个层索引列表，防止残留旧 tensor 被 swap 算子使用 4. request.py：`BatchRequest.append_swap_metadata` / `append_evict_metadata` 构造 `CacheSwapMetadata` 时将 `src_type`/`dst_type` 从字符串改为 `CacheLevel` 枚举，与字段类型声明一致；补充 `CacheLevel` 导入；`match_result` 属性返回类型标注修正为 `Optional[MatchResult]` 5. resource_manager_v1.py：`_allocate_gpu_blocks` 日志从 `INFO` 降级为 `DEBUG`，消除高频调度路径的日志噪音 6. tests/engine/test_request.py：同步更新 `src_type`/`dst_type` 断言为 `CacheLevel` 枚举值，补充 `CacheLevel` 导入 ## Usage or Command 单元测试： ```bash source .venv/py310/bin/activate cd baidu/FastDeploy python -m pytest tests/cache_manager/v1/test_cache_manager.py -v python -m pytest tests/cache_manager/v1/test_transfer_manager.py -v python -m pytest tests/engine/test_request.py -v ``` * [BugFix][KVCache] Fix BlockPool.allocate returns all blocks when num_blocks=0 ## Motivation 当 `allocate(num_blocks=0)` 被调用时，Python 负索引陷阱导致严重错误： `-0 == 0`，所以 `self._free_blocks[-0:]` 等价于 `self._free_blocks[0:]`，会返回并清空整个空闲块列表，而非返回空列表。 ## Modifications 在 `BlockPool.allocate` 中增加对 `num_blocks == 0` 的提前判断，直接返回 `[]`，避免触发 Python 负索引陷阱。 ## Usage or Command ```bash # 运行相关单元测试验证修复 python -m pytest tests/cache_manager/v1/test_cache_manager.py -vv -s ``` * [KVCache][Test] add unit tests for cache_manager v1 modules ## Motivation 补全 cache_manager/v1 各模块的单测覆盖，确保核心方法有完整的测试保障。 ## Modifications 新增/补充以下测试文件，全部 326 个用例通过： - tests/cache_manager/v1/test_block_pool.py（新建）覆盖 BlockPool.get_metadata/set_metadata/resize、DeviceBlockPool/HostBlockPool - tests/cache_manager/v1/test_metadata.py（新建）覆盖 BlockNode、RadixTreeStats、MatchResult、CacheSwapMetadata、AsyncTaskHandler - tests/cache_manager/v1/test_cache_utils.py（补充）新增 hash_block_tokens、get_request_block_hasher、LayerDoneCounter 时间追踪及内部辅助方法 - tests/cache_manager/v1/test_radix_tree.py（补充）新增 TestCompleteSwapToDevice 专项测试类（6 个用例） - tests/cache_manager/v1/test_cache_manager.py（补充）新增 offload_to_host、load_from_host、pending backup 系列、prepare_prefetch_metadata - tests/cache_manager/v1/test_transfer_manager.py（补充）新增 _swap_single_layer 校验路径、sync_input/output_stream、record_input_stream_event ## Usage or Command ```bash # 运行所有新增单测 source .venv/py310/bin/activate python -m pytest tests/cache_manager/v1/test_block_pool.py \ tests/cache_manager/v1/test_metadata.py \ tests/cache_manager/v1/test_cache_utils.py \ tests/cache_manager/v1/test_radix_tree.py \ tests/cache_manager/v1/test_cache_manager.py \ tests/cache_manager/v1/test_transfer_manager.py -v # 期望结果：326 passed ``` --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-04-21 14:39:00 +08:00
Nyako Shigure	d659099415	[Cleanup] Replace torch proxy alias with public compat API (#7348 )	2026-04-13 11:43:26 +08:00
Yonghua Li	98f3fc9267	[RL] [KVCache] let cache transfer managers update key prefix after weight update and add unit tests (#7083 ) * [test] add a few unit tests * [feat] update key prefix when model weights are updated * [test] try to fix test_worker_process	2026-04-02 19:58:41 +08:00
kevin	18062c55bb	[BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys (#6929 ) * [BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix][KVCache] Fix test_get_block_hash_extra_keys_boundary_cases assertions ## Motivation 测试用例 `test_get_block_hash_extra_keys_boundary_cases` 中，Block [4,8) 的调用错误地传入了 `mm_idx=1`，跳过了 img0[2,5)；但 img0 覆盖 token 4，token 4 属于 block [4,8)，应被包含在 hash_keys 中。此外，所有 assertEqual 只校验了 hash_keys，未校验返回的 mm_idx 游标。 ## Modifications - `test_get_block_hash_extra_keys_boundary_cases`： - 改为链式调用，用上一次返回的 mm_idx 作为下一次入参，模拟真实调用循环 - Block [4,8) 入参从 `mm_idx=1` 改为沿用上次返回的 `mm_idx=0`，期望值从 `[]` 改为 `["hash-0"]` - 所有断言改为 `assertEqual((mm_idx, hash_keys), (...))` 同时校验游标 - `test_get_block_hash_extra_keys_no_overlap_at_boundaries`： - Case B 入参从 `mm_idx=1` 改为 `mm_idx=0`（从头遍历，img-a 走 continue） - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_image_crosses_block_boundary`： - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_no_mm_inputs`： - 断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_handles_multimodal_segments`： - call2、call3 断言增加 mm_idx 校验 ## Usage or Command ```bash python -m pytest tests/cache_manager/test_prefix_cache_manager.py::TestPrefixCacheManagerCoverage -v -k "get_block_hash_extra_keys" ``` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: chengyanfu <chengyanfu@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 17:13:31 +08:00
Yonghua Li	a7f52c300d	[Feature] support v1 update/clear api for RL (#6761 ) * [Feature] support v1 update/clear api for RL * [fix] fix execute_model and add sleep/wakeup api * [fix] fix mtp and key_prefix * [chore] move _update_key_prefix to resume method * [fix] make the interface safe to call multiple times * [fix] fix some tiny bugs * [chore] make small changes against pr review * [docs] add docs for weight update * [test] add some tests and update docs * [style] fix code style check * [test] fix ci * [fix] fix stale control responses when control method timed out * [chore] remove unused code * [chore] fix code style * [chore] optimize tags and key_prefix * [test] fix ci * [chore] fix code style * [test] fix ci * [fix] fix ep control * [fix] fix ep control for engine cache queue	2026-03-25 19:18:46 +08:00
jc	bb881c2c0a	[PD Disaggregation] pd + cache_storage support vl model (#6906 ) * pd + cache_storage support vl model * support vl model * fix test	2026-03-23 15:35:20 +08:00
Jiang-Jia-Jun	b05a6c4206	[BugFix][KVCache] Add inter-process lock to fix NaN error under DP+EP (#6724 ) * [BugFix] Support to fix NaN bug in EP * Optimze notion for all the funs * Fix potential lock contention failure issues * Update fastdeploy/inter_communicator/ipc_signal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update envs.py * Update default value for USE_KVCACHE_LOCK Change default value of USE_KVCACHE_LOCK from 1 to 0. * Update worker_process.py * Fix suffix wrong * Update test_prefix_cache_manager.py --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-10 21:55:32 +08:00
kesmeey	aae87e6ae2	[CI] 【Hackathon 10th Spring No.27】功能模块 fastdeploy/cache_manager/prefix_cache_manager.py单测补充 (#6297 ) * test: update prefix cache manager tests * test: refine prefix cache manager coverage helpers * style: apply black formatting to test_prefix_cache_manager.py Co-authored-by: Cursor <cursoragent@cursor.com> * tests: update test_prefix_cache_manager Co-authored-by: Cursor <cursoragent@cursor.com> * update --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-03-02 20:04:12 +08:00
Yonghua Li	7cf5e64c7a	[BugFix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend (#6516 ) * [fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend * [fix] fix test_cache_transfer_manager * [fix] fix test_cache_transfer_manager again --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-03-01 13:43:31 +08:00
kesmeey	bf14ea18aa	tests: fix cache_transfer_manager threading and init mocks (#6502 ) tests: fix cache_transfer_manager threading and init mocks	2026-02-26 17:32:51 +08:00
Yonghua Li	e2332a1112	[BugFix] fix num_cpu_blocks computation (#6438 ) * [BugFix] fix num_cpu_blocks computation * [fix] fix syntax and log * [fix] pre-commit * [fix] use getattr * [fix] ci test	2026-02-13 11:05:14 +08:00
kesmeey	e4e3a71e7b	[CI] 【Hackathon 10th Spring No.22】功能模块 fastdeploy/cache_manager/cache_transfer_manager.py 单测补充 (#6157 ) * Add comprehensive test coverage for cache_transfer_manager.py * Fix code style: add newline at end of file * fix: update cache transfer manager tests for branch 22 interface changes * fix: resolve test errors for cache transfer manager * fix: update cache transfer manager tests for branch 22 interface changes * style: apply pre-commit formatting to tests/cache_manager/test_cache_transfer_manager.py * Run codestyle: format tests/cache_manager/test_cache_transfer_manager.py and related fixes * Update test_cache_transfer_manager.py * Format cache transfer manager tests * Update cache transfer manager tests * Update unit test coverage workflow --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-02-11 11:23:57 +08:00
Yonghua Li	5ac5ecd0b0	[BugFix] fix cache transfer tasks failure after cache cleared (#6202 ) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task * [fix] fix cache manager hang when clearing prefix cache * [fix] fix list_proxy has no clear method * [fix] fix barrier * [fix] add barrier0 * [fix] add cache_task_is_paused_signal * [fix] fix condition * [fix] fix cache transfer sync and delay prefix cache tree clearing * [fix] fix typo * [chore] polish code * [fix] revert only rank0 write kv_cache_status_signal * [fix] fix thread pool and prefix cache manager hang * [fix] add timeout for task_swapping_event * [fix] tolerate prefix cache manager error while prefix tree is cleared * [chore] add more log * [fix] fix test_prefix_cache_manager * [fix] fix prefix_cache_status_signal usage	2026-02-08 15:33:56 +08:00
jc	b1698a79cb	[RL] add version to the key of cache storage && refine raising error (#6160 ) * Waiting for cache transfer manager inited * up * up * up * up * up * fix according comments * fix unittest * fix * fix unittest * fix error * pass storage_backend to worker	2026-01-27 10:47:46 +08:00
Yonghua Li	8d27a523e7	[Feature] [KVCache] support attention_store kv cache backend (#5823 ) * [feat] support attention_store kv cache backend * [fix] fix codestyle * [chore] optimize log * [fix] fix write storage task * [fix] fix read storage * [fix] fix code conflict after merge develop * [fix] fix cache bytes and read task token ids * [chore] add model for cache transfer manager * [chore] add some log * [chore] remove launched_cache_manager_signal * [fix] fix write_back_storage_task match_block_num condition * [fix] fix swap_cost_time * [ci] fix ci * Update fastdeploy/engine/sched/resource_manager_v1.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/cache_transfer_manager.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-22 21:01:23 +08:00
kevin	eabd01cd21	[BugFix] fix eb5 prefix bug (#5879 ) * fix eb5 prefix bug * update ci test * update code * update code * update code * update code * update code * update code * update code	2026-01-06 23:50:39 -08:00
Yonghua Li	9445fbe054	[KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled (#5871 ) * [fix] temporarily forbid cpu cache in update/clear api * [fix] stop launching cache transfer manager unless hierarchical cache is enabled * [fix] fix no attr hierarchical cache * [fix] fix ci * [fix] fix test_prefix_cache_manager.py	2026-01-06 14:27:47 +08:00
jc	e911ac2ce7	[BugFix] Refine the preparation of cpu and storage cache (#5777 ) * Refine the preparation of cpu and storage cache * fix error * fix error * up * fix * up docs * fix unittest * remove debug info	2026-01-05 10:13:30 +08:00
kevin	52dc9a7b85	[BugFix] skip mm revert (#5848 ) * skip mm revert * update code * update test	2026-01-04 14:25:45 +08:00
周周周	7ae13b2326	[PD Disaggregation]remove unsed para in RDMACommManager (#5814 )	2025-12-30 11:38:30 +08:00
xunyoyo	7e39560a42	[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new (#5726 ) * Add cache messager coverage tests * Add default_dtype parameter to test cache manager --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:42:27 +08:00
Juncai	412867fd99	[Feature] Support KV Cache Storage (#5571 ) * Support Mooncake Store * up * up * add op * fix conflict * fix error * up for comments * avoid thread lock * up * fix unittest * fix unittest * remove debug info * consider tp_size > 1 * add default rdma_nics * add utils * up * fix error --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-25 16:30:35 +08:00
xunyoyo	2d2619d300	[CI] 【Hackathon 9th Sprint No.36】NO.36 功能模块单测补充（修复） (#5609 ) * Implement unit tests for PrefixCacheManager * Update prefix cache manager tests * Handle get_all_visible_devices in prefix cache manager tests * Add repo root to prefix cache manager tests sys.path * Use pathlib for repo root in prefix cache manager tests * Refine repo root Path import in tests * Handle list-based visible device configuration * Refine PrefixCacheManager test stubs * Run pre-commit on prefix cache manager tests * Remove duplicate pytest import in cache manager tests * Add tests for visible device formatting * Revert * Simplify test stubs in prefix cache manager tests * Refine PrefixCacheManager tests * Adjust prefix cache manager tests per review * Remove ignored tests from coverage configuration * Make prefix cache manager tests runnable without paddle * Use real paddle import in prefix cache manager tests * Clean up imports in test_prefix_cache_manager.py Removed unnecessary import of 'os' and related path manipulation. * Update test_prefix_cache_manager.py * Replace pid_suffix with ipc_suffix in tests * Add local cache queue and RDMA ports to cache config	2025-12-18 16:08:42 +08:00
Yonghua Li	0c8c6369ed	[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 ) * [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports * [fix] fix some bugs * [fix] fix rdma port for cache manager/messager * [fix] temporarily cancel port availability check to see if it can pass ci test * [feat] simplify args for multi api server * [fix] fix dp * [fix] fix port for xpu * [fix] add tests for ports post processing & fix ci * [test] fix test_multi_api_server * [fix] fix rdma_comm_ports args for multi_api_server * [fix] fix test_common_engine * [fix] fix test_cache_transfer_manager * [chore] automatically setting FD_ENABLE_MULTI_API_SERVER * [fix] avoid api server from creating engine_args twice * [fix] fix test_run_batch * [fix] fix test_metrics * [fix] fix splitwise connector init * [test] add test_rdma_transfer and test_expert_service * [fix] fix code syntax * [fix] fix test_rdma_transfer and build wheel with rdma script	2025-12-17 15:50:42 +08:00
xunyoyo	55609a51fc	[CI] 【Hackathon 9th Sprint No.36】NO.36 功能模块单测补充 (#5058 ) * Implement unit tests for PrefixCacheManager * Update prefix cache manager tests * Handle get_all_visible_devices in prefix cache manager tests * Add repo root to prefix cache manager tests sys.path * Use pathlib for repo root in prefix cache manager tests * Refine repo root Path import in tests * Handle list-based visible device configuration * Refine PrefixCacheManager test stubs * Run pre-commit on prefix cache manager tests * Remove duplicate pytest import in cache manager tests * Add tests for visible device formatting * Revert * Simplify test stubs in prefix cache manager tests * Refine PrefixCacheManager tests * Adjust prefix cache manager tests per review	2025-12-16 19:19:03 +08:00
kevin	c9b47f90ce	[BugFix] fix cpu prefix cache bug (#5544 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix_dy_c8_bug * add block_num check * fix test case * update ci case	2025-12-16 14:21:42 +08:00
Echo-Nie	1b1bfab341	[CI] Add unittest (#5328 ) * add test_worker_eplb * remove tesnsor_wise_fp8 * add copyright	2025-12-09 19:19:42 +08:00
Juncai	1a559c973f	Revert "[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 (#5056 )" (#5286 ) This reverts commit `a12eaf9171`.	2025-11-28 10:48:35 +08:00
xunyoyo	a12eaf9171	[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 (#5056 ) * Add cache messager unit tests * Refactor test_cache_messager.py with new stubs Updated copyright information and modified function names for clarity. * Add missing stubs for cache messager tests --------- Co-authored-by: Tao Luo <luotao02@baidu.com>	2025-11-27 11:05:50 +08:00
kevin	c068a4f642	[Feature] dyc8 support prefixcache (#5125 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * dyc8 support prefixcache * fix cache_trans test case * update code	2025-11-21 19:46:26 +08:00
ltd0924	5bf48de999	[KVCache] support unified cache backend (#4903 ) * [Feature] support unified cache backend * fix * fix * fix * fix * Update metax_model_runner.py * fix * update * Update test_moba_attention_backend.py --------- Co-authored-by: ltd0924 <luotingdan@baidu.com>	2025-11-12 14:54:52 +08:00
xiaolei373	dbca63f862	[bugfix] kill cache_transfer_manager process (#4401 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details	2025-10-16 20:45:24 +08:00

32 Commits