FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-07 16:08:58 +08:00

Author	SHA1	Message	Date
jc	7b1d787b4b	[BugFix] Fix storage_backend_type comparison bug in cache_transfer_manager.py (#6514 ) Co-authored-by: root <root@tjzj-inf-sci-k8s-hzz1-h62ni7-2178.tjzj.baidu.com>	2026-02-26 19:32:24 +08:00
Yonghua Li	e2332a1112	[BugFix] fix num_cpu_blocks computation (#6438 ) * [BugFix] fix num_cpu_blocks computation * [fix] fix syntax and log * [fix] pre-commit * [fix] use getattr * [fix] ci test	2026-02-13 11:05:14 +08:00
CSWYF3634076	ec128068b7	[Others] Exit to ensure no residual processes (cpu cache & dp) (#6377 ) * [Others] good exit single dp * [Others] good exit cpu cache dp>1 * [Others] good exit cpu cache dp>1 unittest	2026-02-09 20:38:38 +08:00
Jiang-Jia-Jun	18e79dd660	[Metrics] Support cpu-cache-block-num (#6390 ) Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com>	2026-02-09 10:27:56 +08:00
Yonghua Li	5ac5ecd0b0	[BugFix] fix cache transfer tasks failure after cache cleared (#6202 ) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task * [fix] fix cache manager hang when clearing prefix cache * [fix] fix list_proxy has no clear method * [fix] fix barrier * [fix] add barrier0 * [fix] add cache_task_is_paused_signal * [fix] fix condition * [fix] fix cache transfer sync and delay prefix cache tree clearing * [fix] fix typo * [chore] polish code * [fix] revert only rank0 write kv_cache_status_signal * [fix] fix thread pool and prefix cache manager hang * [fix] add timeout for task_swapping_event * [fix] tolerate prefix cache manager error while prefix tree is cleared * [chore] add more log * [fix] fix test_prefix_cache_manager * [fix] fix prefix_cache_status_signal usage	2026-02-08 15:33:56 +08:00
jc	d6b3c722c1	[KVCache] Storage cache supports c8 model (#6298 ) * Refine cache transfer manager * Storage cache supports c8 model	2026-02-06 12:01:17 +08:00
Moonchild1227	39dc4b0c2e	[Feature] [KVCache] support file_store kv cache backend (#6188 ) * fix(examples): comment out stop.sh to avoid error when script is missing * feat: add file_store support for cache manager * [fix] fix multi gpu transfer * [fix] fix global kvcache transfer * [Feature] [KVCache] support file_store kv cache backend * chore: update FileStore according to PR comments * fix: remove comments * fix: add swap_cache_layout for file store * fix: remove rank key * fix: Switch KV cache storage to pure file mode * Temporarily disable support for Tensor types * fix: remove args --kvcache_file_path & add envs FILE_BACKEND_STORAGE_DIR * fixx: Simplify cache_transfer_manager.py * fix: fix syntax bug * fix: Simplify file_store.py * fix: Use the key directly as the filename * fix: Simplify set() * fix: Simplify cache_transfer_manager.py & file_store.py * fix: Only support load to cpu buffer * feat: add FileStore backend for cache transfer * fix: guard zmq import	2026-02-03 14:37:58 +08:00
chenjian	af1b1d2d56	[Feature] Support report token index by attention store (#6285 ) * [Feature] Support report token index by attention store * fix format	2026-02-02 10:41:11 +08:00
chenjian	292bab7e6d	[BugFix] Fix bug for enable output caching (#6226 ) * [BugFix] Fix bug for enable output caching * fix * Fix * fix * fix ci	2026-01-30 10:55:36 +08:00
jc	b1698a79cb	[RL] add version to the key of cache storage && refine raising error (#6160 ) * Waiting for cache transfer manager inited * up * up * up * up * up * fix according comments * fix unittest * fix * fix unittest * fix error * pass storage_backend to worker	2026-01-27 10:47:46 +08:00
Yonghua Li	833d00e2d7	[BugFix] move cache creation back to cache transfer process and adapt clear/update (#6144 ) * [fix] move cache creation back to cache transfer process * [fix] fix clear cache * [chore] change some log level * [fix] fix clear cache * [fix] fix clear cache for blockwisefp8 and mtp * [fix] fix c8 * [fix] fix clear_mtp_cache args * [chore] update cache_transfer_manager * [fix] fix update mtp cache	2026-01-24 21:59:13 +08:00
Yonghua Li	8d27a523e7	[Feature] [KVCache] support attention_store kv cache backend (#5823 ) * [feat] support attention_store kv cache backend * [fix] fix codestyle * [chore] optimize log * [fix] fix write storage task * [fix] fix read storage * [fix] fix code conflict after merge develop * [fix] fix cache bytes and read task token ids * [chore] add model for cache transfer manager * [chore] add some log * [chore] remove launched_cache_manager_signal * [fix] fix write_back_storage_task match_block_num condition * [fix] fix swap_cost_time * [ci] fix ci * Update fastdeploy/engine/sched/resource_manager_v1.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/cache_transfer_manager.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-22 21:01:23 +08:00
qwes5s5	b2a2e11551	[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320 ) * request disconnect * request disconnect * fix bug * fix bug--amend --------- Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>	2026-01-16 11:46:13 +08:00
Daci	e10b51b8c6	[Feature] get_output_kv_signal blocking read mode & send_first_token (#5836 ) * get_output_kv_signal blocking read mode * send first token before recycle * xpu get_output_kv_signal blocking read mode --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-15 14:11:03 +08:00
Yonghua Li	456637002d	[BugFix] fix cache transfer manager updating/clearing (#5930 ) * [fix] fix cache transfer manager updating/clearing * [fix] fix code style * [fix] fix config * [fix] fix engine client * [fix] let worker update kv cache status signal * [fix] update worker process * [fix] fix clear/update for case if comm group is shutdown * [fix] update dynamic weight manager * [fix] fix port * [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting	2026-01-13 05:09:29 -08:00
Yonghua Li	60ee72f682	[BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935 ) * [fix] fix rdma script and add more error log for multi api server * [fix] log * [fix] fix test_multi_api_server * [fix] fix multi api server port check --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-12 10:38:52 +08:00
kevin	2d2b156252	[BugFix] fix dyc8 cache bug (#5958 ) * fix dyc8 cache bug * update code	2026-01-08 19:25:47 -08:00
kevin	eabd01cd21	[BugFix] fix eb5 prefix bug (#5879 ) * fix eb5 prefix bug * update ci test * update code * update code * update code * update code * update code * update code * update code	2026-01-06 23:50:39 -08:00
kevin	a76e8ae40c	[Feature] support rdma pd dy-c8 (#5788 ) * add rdma pd dy-c8 * update code	2026-01-07 14:55:25 +08:00
Yonghua Li	9445fbe054	[KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled (#5871 ) * [fix] temporarily forbid cpu cache in update/clear api * [fix] stop launching cache transfer manager unless hierarchical cache is enabled * [fix] fix no attr hierarchical cache * [fix] fix ci * [fix] fix test_prefix_cache_manager.py	2026-01-06 14:27:47 +08:00
jc	e9b25aa72f	[BugFix] Storage backend gets env params (#5892 ) * Storage backend gets env params * up * up * up	2026-01-06 14:14:17 +08:00
jc	e911ac2ce7	[BugFix] Refine the preparation of cpu and storage cache (#5777 ) * Refine the preparation of cpu and storage cache * fix error * fix error * up * fix * up docs * fix unittest * remove debug info	2026-01-05 10:13:30 +08:00
jc	95257c1dbd	[Feature] RDMACommunicator send key and value scale (#5737 ) * RDMACommunicator send key and value scale --------- Co-authored-by: kevin <chengyf112@gmail.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-05 10:04:24 +08:00
kevin	52dc9a7b85	[BugFix] skip mm revert (#5848 ) * skip mm revert * update code * update test	2026-01-04 14:25:45 +08:00
MingkunZhang	f732d7d2ad	[Metax] adapt prefix caching & cpu swap (#5844 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2025-12-31 17:02:48 +08:00
周周周	7ae13b2326	[PD Disaggregation]remove unsed para in RDMACommManager (#5814 )	2025-12-30 11:38:30 +08:00
kevin	5538dda3c8	[Feature] pd support dy-c8 ipc (#5750 ) * pd support dy-c8 ipc * update code * support v0 * update code	2025-12-25 21:22:34 +08:00
Juncai	412867fd99	[Feature] Support KV Cache Storage (#5571 ) * Support Mooncake Store * up * up * add op * fix conflict * fix error * up for comments * avoid thread lock * up * fix unittest * fix unittest * remove debug info * consider tp_size > 1 * add default rdma_nics * add utils * up * fix error --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-25 16:30:35 +08:00
Yonghua Li	0c8c6369ed	[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 ) * [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports * [fix] fix some bugs * [fix] fix rdma port for cache manager/messager * [fix] temporarily cancel port availability check to see if it can pass ci test * [feat] simplify args for multi api server * [fix] fix dp * [fix] fix port for xpu * [fix] add tests for ports post processing & fix ci * [test] fix test_multi_api_server * [fix] fix rdma_comm_ports args for multi_api_server * [fix] fix test_common_engine * [fix] fix test_cache_transfer_manager * [chore] automatically setting FD_ENABLE_MULTI_API_SERVER * [fix] avoid api server from creating engine_args twice * [fix] fix test_run_batch * [fix] fix test_metrics * [fix] fix splitwise connector init * [test] add test_rdma_transfer and test_expert_service * [fix] fix code syntax * [fix] fix test_rdma_transfer and build wheel with rdma script	2025-12-17 15:50:42 +08:00
kevin	c9b47f90ce	[BugFix] fix cpu prefix cache bug (#5544 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix_dy_c8_bug * add block_num check * fix test case * update ci case	2025-12-16 14:21:42 +08:00
kevin	954a145d57	[Optimization] support mm prefill batch (#5313 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support mm prefill batch * update code * update code * update code * update code * fix encoder cache bug * update code * update code * fix bug * fix paddle ocr bug * fix xpu bug * update code	2025-12-11 22:21:14 +08:00
Juncai	83ea9646f9	[PD Disaggregation] Unify the disaggregation info and the pd communication (#5438 ) * Unify the disaggregation info and the pd communication * up * up * fix * fix conflict * fix unittest	2025-12-09 14:44:59 +08:00
Daci	2f208db4e9	[Feature] Multimodal Model P / D Separation (#5323 ) * RouterArgs port str -> int * fix race condition [is_fetching] causing multiple fetch requests * bugfix: Delete duplicate input_ids tensor creation * mm pd splitwise json -> pickle5; multimodal_inputs only pos id; debuglog f to %s * fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ... * update cr * Apply suggestions from code review Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * pre-commit fix * rm multimodal_inputs deepcopy & fix rdma_cache_transfer.py tpsize=0 --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-09 10:47:42 +08:00
Yonghua Li	f4119d51b4	[PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197 ) * [fix] support DP via v1 router and decouple DP and EP * [fix] fix scripts * [fix] reset model path * [fix] dp use get_output_ep, fix router port type, update scripts * [merge] merge with latest code * [chore] remove some debug log * [fix] fix code style check * [fix] fix test_multi_api_server for log_dir name * [chore] reduce logs * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-04 15:38:43 +08:00
K11OntheBoat	2e1680838f	[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251 ) * Support deepseekv3 cache transfer for PD deploy * clean some log info --------- Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-12-02 14:11:50 +08:00
Juncai	0925d44f18	[PD Disaggregation] support different tp_size for prefill and decode (#5296 ) * up * up * up * fix	2025-12-01 17:50:20 +08:00
Yonghua Li	cead6b26fa	[Metrics] Update time_to_first_token to include tokenization & queue time, and remove redundant metrics (#4993 ) * [update] update time_to_first_tokens to include queue time, and remove first_token_latency and infer_latency * [doc] update docs * [ci] fix test * [chore] delete redundant code --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-11-26 14:42:17 +08:00
Yuanle Liu	f69e0839f7	dummy import fd (#5192 )	2025-11-24 20:23:07 +08:00
kevin	c068a4f642	[Feature] dyc8 support prefixcache (#5125 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * dyc8 support prefixcache * fix cache_trans test case * update code	2025-11-21 19:46:26 +08:00
ddchenhao66	e70e2279ce	[PD Disaggregation][XPU] Add XPU support for PD disaggregation (#5113 ) * [XPU] xpu support PD disaggregation * [XPU] fix the issue of cache KV transfer process startup failure on non-zero XPU cards * [XPU] xpu support PD disaggregation in v1 scheduler --------- Co-authored-by: ddchenhao66 <dhaochen163.com>	2025-11-21 14:09:01 +08:00
Yonghua Li	43097a512a	[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol (#5132 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [fix] fix v1 scheduler profile run for append attention in prefill node * [fix] skip send_signal if kv signal not inited for gpu and xpu * [fix] extend fix to flash_attn & mla_attn * [fix] fix v1 pd run in ipc transfer protocol * [ci] add test for v1 pd profile run using ipc transfer protocol * [style] fix code style check * [style] fix code style again * [fix] fix profile run * [update] remove --num-gpu-blocks-override in example script * [chore] rename forward_meta is_profiling to is_dummy_or_profile_run	2025-11-20 21:39:22 +08:00
Juncai	36822fa49c	[PD Disaggregation] remove splitwise deployment on single node and refine the code (#4891 ) * remove splitwise deployment on single node and refine the code * up * up * up * add test * up	2025-11-14 09:56:53 +08:00
ltd0924	5bf48de999	[KVCache] support unified cache backend (#4903 ) * [Feature] support unified cache backend * fix * fix * fix * fix * Update metax_model_runner.py * fix * update * Update test_moba_attention_backend.py --------- Co-authored-by: ltd0924 <luotingdan@baidu.com>	2025-11-12 14:54:52 +08:00
chenjian	78895e2c7d	[Bug Fix] fix bug for PD EP (#4823 ) * fix bug for PD EP * fix * optimize perf for engine worker queue * fix bug * fix internode ll two stage * fix for ci * fix bug	2025-11-10 15:33:29 +08:00
kevin	cc34487810	[Feature] support mm disable_chunked (#4803 ) * support mm disable_chunked * update code * update code * update code	2025-11-06 21:32:25 +08:00
Juncai	08ca0f6aea	[Feature] [PD] add simple router and refine splitwise deployment (#4709 ) * add simple router and refine splitwise deployment * fix	2025-11-06 14:56:02 +08:00
chenjian	25498efcf3	[Optimize] Support and robust for tpN for PD (#4595 ) * [Optimize] Support and robust for tpN for PD * fix * fix * support dpM tpN for cache messager * fix * fix token counter * fix bug for merge develop * fix bug * robust cache messager for v0	2025-11-03 15:38:31 +08:00
kevin	8aab4e367f	[Feature] mm support prefix cache (#4134 ) * support mm prefix caching * update code * fix mm_hashes * support encoder cache * add encoder cache * update code * update encoder cache * fix features bug * fix worker bug * support processor cache, need to optimize yet * refactor multimodal data cache * update code * update code * update v1 scheduler * update code * update code * update codestyle * support turn off processor cache and encoder cache * update pre-commit * fix code * solve review * update code * update code * update test case * set processor cache in GiB * update test case * support mm prefix caching for qwen model * fix code style check * update pre-commit * fix unit test * fix unit test * add ci test case * fix rescheduled bug * change text_after_process to prompt_tokens * fix unit test * fix chat template * change model path * [EP] fix adapter bugs (#4572) * Update expert_service.py * Update common_engine.py * Update expert_service.py * fix v1 hang bug (#4573) * fix import image_ops error on some platforms (#4559) * [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558) * add collect-env * del files * [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578) * add new branch for sot * reorder * fix batch bug * [XPU]Moe uses a new operator (#4585) * [XPU]Moe uses a new operator * [XPU]Moe uses a new operator * update response * [Feature] Support Paddle-OCR (#4396) * init * update code * fix code style & disable thinking * adapt for common_engine.update_mm_requests_chunk_size * use 3d rope * use flash_attn_unpadded * opt siglip * update to be compatible with the latest codebase * fix typo * optim OCR performance * fix bug * fix bug * fix bug * fix bug * normlize name * modify xpu rope * revert logger * fix bug * fix bug * fix bug * support default_v1 * optim performance * fix bug --------- Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com> Co-authored-by: zhangyue66 <zhangyue66@baidu.com> * [DataProcessor] add reasoning_tokens into usage info (#4520) * add reasoning_tokens into usage info initial commit * add unit tests * modify unit test * modify and add unit tests * fix unit test * move steam usage to processor * modify processor * modify test_logprobs * modify test_logprobs.py * modify stream reasoning tokens accumulation * fix unit test * perf: Optimize task queue communication from engine to worker (#4531) * perf: Optimize task queue communication from engine to worker * perf: get_tasks to numpy * perf: get_tasks remove to_numpy * fix: request & replace ENV * remove test_e2w_perf.py * fix code style --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Clean up ports after processing results (#4587) * [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593) * [Others] api server exits when worker process is dead (#3271) * [fix] fix terminal hangs when worker process is dead * [chore] change sleep time of monitor * [chore] remove redundant comments * update docs --------- Co-authored-by: ApplEOFDiscord <wwy640130@163.com> Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: yinwei <yinwei_hust@163.com> Co-authored-by: JYChen <zoooo0820@qq.com> Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com> Co-authored-by: Ryan <zihaohuang@aliyun.com> Co-authored-by: yyssys <atyangshuang@foxmail.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com> Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com> Co-authored-by: zhangyue66 <zhangyue66@baidu.com> Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com> Co-authored-by: SunLei <sunlei5788@gmail.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com>	2025-10-27 17:39:51 +08:00
zhupengyang	3a6883ac1a	c++ code format (#4527 )	2025-10-22 17:59:50 +08:00
ltd0924	fb76cdfb4f	[Fearture] Support mm model close prefix cache (#4459 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Feature] support prefix cache in DP * fix * Update common_engine.py * Update common_engine.py * Update common_engine.py * Update common_engine.py * [BugFix] fix workers more than 1 * fix * Update api_server.py * fix * Update api_server.py * fix * [Fearture] Support mm model close prefix cache * Update api_server.py * Update engine_client.py * Update engine_client.py * add test * Update test_chat.py * fix * fix * Update test_chat.py * Update test_chat.py --------- Co-authored-by: ltd0924 <luotingdan@baidu.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-10-21 15:37:59 +08:00

1 2

82 Commits