Commit Graph

301 Commits

Author SHA1 Message Date
Daci d8c6ba61f3 [BugFix] resource_manager_v1 lock PD (#5616)
* bugfix resource_manager_v1 lock PD

* with lock add_prefilled_request

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-08 10:02:54 +08:00
chenjian 925e7edd3c [Bug fix] Limit multi-modal request to 1 (#5901) 2026-01-07 20:25:07 +08:00
chenjian c883a2d3ec [Optimization] Reduce preemption occurrence when blocks not enough (#5696)
* [Optimize] Reduce preemption occurrence when blocks not enough for decoding

* fix

* fix

* fix spell

* optimize performance

* fix
2026-01-07 20:01:16 +08:00
kevin eabd01cd21 [BugFix] fix eb5 prefix bug (#5879)
* fix eb5 prefix bug

* update ci test

* update code

* update code

* update code

* update code

* update code

* update code

* update code
2026-01-06 23:50:39 -08:00
fmiao2372 1ee285c2d6 [Intel HPU] enable chunked prefill (#5903)
* [Intel HPU] enable chunked prefill

* fix bug by copilot comments
2026-01-06 21:01:50 +08:00
jc 8d384f9fd8 [PD Disaggregation] Update usage of pd disaggregation and data parallel (#5742)
* Update usage of pd disaggregation

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up dp docs

* up

* up

* up

* fix unittest
2026-01-05 17:51:29 +08:00
jc e911ac2ce7 [BugFix] Refine the preparation of cpu and storage cache (#5777)
* Refine the preparation of cpu and storage cache

* fix error

* fix error

* up

* fix

* up docs

* fix unittest

* remove debug info
2026-01-05 10:13:30 +08:00
kevin 52dc9a7b85 [BugFix] skip mm revert (#5848)
* skip mm revert

* update code

* update test
2026-01-04 14:25:45 +08:00
MingkunZhang f732d7d2ad [Metax] adapt prefix caching & cpu swap (#5844)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2025-12-31 17:02:48 +08:00
ddchenhao66 9e45ef7ca9 [XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL (#5831) 2025-12-31 09:49:12 +08:00
kevin 74e162697f eb5 mm skip prefix cache (#5838) 2025-12-30 05:30:48 -08:00
CSWYF3634076 9286403570 [Models] Add Qwen3-VL Model Support (#5763)
* support v1 loader

* remove useless code

* remove useless

* [Model] support Qwen3VL images success

* [Model] support Qwen3VL rope_3d

* [Model] support Qwen3VL remove log

* [Model] support Qwen3VL RL

* [Model] support Qwen3VL tp

* [Model] support Qwen3VL video

* [Model] support Qwen3VL fix ernievl

* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds

* [Model] support Qwen3VL fix multi card

* [Model] support Qwen3VL file close

* [Model] support Qwen3VL fix ce

* [Model] support Qwen3VL fix unittest

* [Model] support Qwen3VL add unittest

---------

Co-authored-by: Ayakouji <yuhongh@qq.com>
2025-12-29 17:39:33 +08:00
kevin 894f4e312b [FDConfig] disable chunked_mm_input in ernie5 (#5774)
* disable chunked_mm_input in ernie5

* update code

* update code

* update test case

* update testcase

* upate case
2025-12-26 15:31:27 +08:00
yzwu 7b6cc11952 [Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735) 2025-12-26 14:01:27 +08:00
RichardWooSJTU 01c18f328f rename need_block_num_signal (#5623) 2025-12-26 11:02:29 +08:00
Juncai 412867fd99 [Feature] Support KV Cache Storage (#5571)
* Support Mooncake Store

* up

* up

* add op

* fix conflict

* fix error

* up for comments

* avoid thread lock

* up

* fix unittest

* fix unittest

* remove debug info

* consider tp_size > 1

* add default rdma_nics

* add utils

* up

* fix error

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 16:30:35 +08:00
memoryCoderC be3be4913a [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM (#5195)
* [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM

* [Optimization] refactor(chat_handler,completion_handler): rename class
2025-12-25 16:28:15 +08:00
chenjian b90a922f98 [Bug fix] Set enable_cache_output as false by default (#5751) 2025-12-24 21:37:24 +08:00
GoldPancake 23d488c488 [Feature] Entropy calculation support (#5692)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* support entropy

* fix bug

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-23 21:19:47 +08:00
ming1753 85db9d5e56 [Others] reschedule preempt task support optional func (#5649)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [Others] reschedule preempt task support optional func

* fix bug

* fix bug
2025-12-23 20:45:52 +08:00
ming1753 81384ef29e [BugFix] fix download feature bug (#5669) 2025-12-22 13:46:39 +08:00
Yonghua Li 4f830aa505 [RL] provide options for whether shutdown comm group after weights cleared (#5663)
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [rl] provide options for whether shutdown comm group after weights cleared

* [fix] fix args hardcode

* [fix] change args type

* [fix] add worker process args
2025-12-19 07:06:48 -08:00
kevin 807e404369 [BugFix] fix eb5 mm prefix cache bug (#5638)
* fix eb5 mm prefix cache bug

* update code

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-19 14:57:37 +08:00
fmiao2372 a8fce47195 [Intel HPU] enable kv cache scheduler v1 for hpu (#5648)
* [Intel HPU] enable kv cache scheduler v1 for hpu

* fix copilt comments
2025-12-19 12:03:39 +08:00
yzwu ac013803f3 [Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555) 2025-12-18 02:14:25 -08:00
Yonghua Li 0c8c6369ed [Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415)
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports

* [fix] fix some bugs

* [fix] fix rdma port for cache manager/messager

* [fix] temporarily cancel port availability check to see if it can pass ci test

* [feat] simplify args for multi api server

* [fix] fix dp

* [fix] fix port for xpu

* [fix] add tests for ports post processing & fix ci

* [test] fix test_multi_api_server

* [fix] fix rdma_comm_ports args for multi_api_server

* [fix] fix test_common_engine

* [fix] fix test_cache_transfer_manager

* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER

* [fix] avoid api server from creating engine_args twice

* [fix] fix test_run_batch

* [fix] fix test_metrics

* [fix] fix splitwise connector init

* [test] add test_rdma_transfer and test_expert_service

* [fix] fix code syntax

* [fix] fix test_rdma_transfer and build wheel with rdma script
2025-12-17 15:50:42 +08:00
Jiang-Jia-Jun 2ad3bff4ff [Optim] Optimize costtime in checking tasks in engine-worker-queue (#5580)
* [Optim] Optimize costtime in checking tasks in engine-worker-queue

* Update fastdeploy/engine/common_engine.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-16 19:27:31 +08:00
xiaolei373 a30b4da260 [Feature] Tracing: Fine-Grained Tracing for Request Latency Part1 (#5458) 2025-12-16 16:36:09 +08:00
Jiang-Jia-Jun 8b6395478a Revert "[BugFix] reschedule_preempt_task append waiting & PREEMPTED blocksize…" (#5575)
This reverts commit dbedb0797b.
2025-12-16 11:12:57 +08:00
Daci dbedb0797b [BugFix] reschedule_preempt_task append waiting & PREEMPTED blocksize (#5506)
* bugfix reschedule_preempt_task append waiting & PREEMPTED blocksize

* bugfix reschedule_preempt_task append waiting & PREEMPTED blocksize

* 注释

* [bugfix] PREEMPTED task blocksize

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-12 17:43:29 +08:00
GoldPancake 909059c60a [Feature] Support for request-level speculative decoding metrics monitoring. (#5518)
* support spec metrics monitor per request

* fix bug

* remove debug log

* fix ut bugs
2025-12-12 12:22:18 +08:00
kevin 954a145d57 [Optimization] support mm prefill batch (#5313)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mm prefill batch

* update code

* update code

* update code

* update code

* fix encoder cache bug

* update code

* update code

* fix bug

* fix paddle ocr bug

* fix xpu bug

* update code
2025-12-11 22:21:14 +08:00
Jiang-Jia-Jun 4b3e41c665 [Optim] Improve task-checking performance in engine-worker-queue (#5376)
* [Optim] Optimize costtime in checking tasks in engine-worker-queue

* Update fastdeploy/engine/common_engine.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/inter_communicator/engine_worker_queue.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [Docs] Add docstring to set_exist_tasks method (#5382)

* Initial plan

* Add docstring to set_exist_tasks method

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* [Docs] Add docstring documentation to exist_tasks() method (#5381)

* Initial plan

* Add comprehensive docstring to exist_tasks() method

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* [Optimization] Conditionally initialize shared memory for single-node deployments only (#5383)

* Initial plan

* Conditionally initialize exist_tasks_intra_signal for single-node deployments

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Use is_single_node flag for consistent deployment type checking

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Remove redundant None checks in exist_tasks methods

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* format code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
2025-12-11 10:33:32 +08:00
freeliuzc 53460935ec fix attention bug in spec decoding (#5460) 2025-12-10 10:56:37 +08:00
Juncai 83ea9646f9 [PD Disaggregation] Unify the disaggregation info and the pd communication (#5438)
* Unify the disaggregation info and the pd communication

* up

* up

* fix

* fix conflict

* fix unittest
2025-12-09 14:44:59 +08:00
Nyakku Shigure e1c4a12e34 [Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part (#5223)
---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-09 14:37:00 +08:00
chen 76649b45c1 [Optimization] compulte real max_logprobs in batch (#5430) 2025-12-09 14:15:05 +08:00
zhouchong 5d9b5e4a5b [Engine] [Feature] Refactor async_llm:cross-process with EngineService,based on zmq communication (#4868)
* Refactor async_llm:cross-process with EngineService

* fix: async_llm output process

* fix: return prompt_token_ids and prompt_tokens in first res

* optimize common_engine start func
2025-12-09 10:53:40 +08:00
Daci 2f208db4e9 [Feature] Multimodal Model P / D Separation (#5323)
* RouterArgs port str -> int

* fix race condition [is_fetching] causing multiple fetch requests

* bugfix: Delete duplicate input_ids tensor creation

* mm pd splitwise json -> pickle5; multimodal_inputs only pos id;
debuglog f to %s

* fix ENABLE_V1_KVCACHE_SCHEDULER=0 mm model lack pos_id, ...

* update cr

* Apply suggestions from code review

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* pre-commit fix

* rm multimodal_inputs deepcopy & fix rdma_cache_transfer.py tpsize=0

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-09 10:47:42 +08:00
Juncai a8ffc22032 [BugFix] fix init RequestOutput (#5419)
* fix init RequestOutput

* up

* fix

* fix
2025-12-09 10:20:22 +08:00
Juncai 02df3c5097 FD registers to the Router only once. (#5431)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-08 22:07:11 +08:00
Juncai 80efe98f8d [PD Disaggregation] Add timestamp for analyzing splitwise deployment (#5317)
* Add timestamp for analyzing splitwise deployment

* up

* up

* up

* up

* up

* up

* fix format

* fix
2025-12-08 10:08:44 +08:00
RAM b2908b8e82 [New][RL] Support Rollout Routing Replay (#5405)
* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"

This reverts commit c45e064f3d.

* Fix XPU and NPU bug

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-12-05 22:06:26 +08:00
Jiang-Jia-Jun c45e064f3d Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)
This reverts commit 96d2d4877b.
2025-12-05 20:19:39 +08:00
RAM 96d2d4877b [RL] Support Rollout Routing Replay (#5321)
* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-12-05 20:01:33 +08:00
kevin c9d7f9e7c3 [BugFix] fix async download bug (#5349)
* fix async download bug

* update log

* Revert "update log"

This reverts commit 5816e602f4.

* update code

* fix mtp bug
2025-12-05 18:59:12 +08:00
Yonghua Li 35846909c7 [fix] fix scheduler hang when input length is very close to max_model_len (#5393)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-05 18:23:42 +08:00
Ayakouji a8f8791668 [Optimization] Qwen2.5-VL support multi-batch prefill (#5269)
* update

* fix

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix dict access

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-12-05 18:22:39 +08:00
qwes5s5 1aefbef0b3 fix trace log (#5386) 2025-12-05 14:45:52 +08:00
chenjian 3878a99b69 [Fearture] Support cache kv cache for output tokens (#4535)
* [Fearture] Support cache kv cache for output tokens

* fix bug

* fix ci bug

* improve coverage

* enable output caching by default

* fix ci

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-12-04 20:53:08 +08:00