Files
FastDeploy/fastdeploy/multimodal/image.py
T
kevin 8aab4e367f [Feature] mm support prefix cache (#4134)
* support mm prefix caching

* update code

* fix mm_hashes

* support encoder cache

* add encoder cache

* update code

* update encoder cache

* fix features bug

* fix worker bug

* support processor cache, need to optimize yet

* refactor multimodal data cache

* update code

* update code

* update v1 scheduler

* update code

* update code

* update codestyle

* support turn off processor cache and encoder cache

* update pre-commit

* fix code

* solve review

* update code

* update code

* update test case

* set processor cache in GiB

* update test case

* support mm prefix caching for qwen model

* fix code style check

* update pre-commit

* fix unit test

* fix unit test

* add ci test case

* fix rescheduled bug

* change text_after_process to prompt_tokens

* fix unit test

* fix chat template

* change model path

* [EP] fix adapter bugs (#4572)

* Update expert_service.py

* Update common_engine.py

* Update expert_service.py

* fix v1 hang bug (#4573)

* fix import image_ops error on some platforms (#4559)

* [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558)

* add collect-env

* del files

* [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578)

* add new branch for sot

* reorder

* fix batch bug

* [XPU]Moe uses a new operator (#4585)

* [XPU]Moe uses a new operator

* [XPU]Moe uses a new operator

* update response

* [Feature] Support Paddle-OCR (#4396)

* init

* update code

* fix code style & disable thinking

* adapt for common_engine.update_mm_requests_chunk_size

* use 3d rope

* use flash_attn_unpadded

* opt siglip

* update to be compatible with the latest codebase

* fix typo

* optim OCR performance

* fix bug

* fix bug

* fix bug

* fix bug

* normlize name

* modify xpu rope

* revert logger

* fix bug

* fix bug

* fix bug

* support default_v1

* optim performance

* fix bug

---------

Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com>
Co-authored-by: zhangyue66 <zhangyue66@baidu.com>

* [DataProcessor] add reasoning_tokens into usage info (#4520)

* add reasoning_tokens into usage info initial commit

* add unit tests

* modify unit test

* modify and add unit tests

* fix unit test

* move steam usage to processor

* modify processor

* modify test_logprobs

* modify test_logprobs.py

* modify stream reasoning tokens accumulation

* fix unit test

* perf: Optimize task queue communication from engine to worker (#4531)

* perf: Optimize task queue communication from engine to worker

* perf: get_tasks to numpy

* perf: get_tasks remove to_numpy

* fix: request & replace ENV

* remove test_e2w_perf.py

* fix code style

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Clean up ports after processing results (#4587)

* [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593)

* [Others] api server exits when worker process is dead (#3271)

* [fix] fix terminal hangs when worker process is dead

* [chore] change sleep time of monitor

* [chore] remove redundant comments

* update docs

---------

Co-authored-by: ApplEOFDiscord <wwy640130@163.com>
Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: yinwei <yinwei_hust@163.com>
Co-authored-by: JYChen <zoooo0820@qq.com>
Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com>
Co-authored-by: Ryan <zihaohuang@aliyun.com>
Co-authored-by: yyssys <atyangshuang@foxmail.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com>
Co-authored-by: zhangyue66 <zhangyue66@baidu.com>
Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com>
Co-authored-by: SunLei <sunlei5788@gmail.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com>
2025-10-27 17:39:51 +08:00

145 lines
4.7 KiB
Python

"""
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
import base64
from io import BytesIO
from typing import Any
import requests
from PIL import Image
from .base import MediaIO
class ImageMediaIO(MediaIO[Image.Image]):
def __init__(self, *, image_mode: str = "RGB") -> None:
"""
Initializes the object.
Args:
image_mode (str, optional): The mode of the image, defaults to "RGB". Should be one of "L", "LA", "P",
"RGB", "RGBA", "CMYK", or "YCbCr".
Raises:
ValueError: If `image_mode` is not a valid mode.
Returns:
None: This method does not return anything. It initializes the object with the given parameters.
"""
super().__init__()
self.image_mode = image_mode
def load_bytes(self, data: bytes) -> Image.Image:
"""
将字节数据转换为图像对象,并返回。
该方法会自动调用Image.open和Image.load方法,以及convert方法将图像转换为指定模式(默认为RGB)。
Args:
data (bytes): 包含图像数据的字节对象。
Returns:
Image.Image: 一个包含了原始图像数据的Image对象,已经被转换为指定模式。
Raises:
无。
"""
image = Image.open(BytesIO(data))
image.load()
return image.convert(self.image_mode)
def load_base64(self, media_type: str, data: str) -> Image.Image:
"""
将 base64 编码的字符串转换为图片对象。
Args:
media_type (str): 媒体类型,例如 "image/jpeg"
data (str): base64 编码的字符串数据。
Returns:
Image.Image: PIL 中的图片对象。
Raises:
无。
"""
return self.load_bytes(base64.b64decode(data))
def load_file(self, filepath: str) -> Image.Image:
"""
加载文件,并转换为指定模式。
如果文件不存在或无法打开,将抛出FileNotFoundError异常。
Args:
filepath (str): 文件路径。
Returns:
Image.Image: 返回一个Image.Image对象,表示已经加载和转换的图像。
Raises:
FileNotFoundError: 当文件不存在时抛出此异常。
"""
image = Image.open(filepath)
image.load()
return image.convert(self.image_mode)
def load_file_request(self, request: Any) -> Image.Image:
"""
从请求中加载图像文件,并返回一个PIL Image对象。
该函数需要传入一个包含图像URL的字符串或者可迭代对象(如requests库的Response对象)。
该函数会自动处理图像的格式和大小,并将其转换为指定的模式(默认为RGB)。
Args:
request (Any): 包含图像URL的字符串或者可迭代对象(如requests库的Response对象)。
Returns:
Image.Image: PIL Image对象,表示已经加载并转换好的图像。
Raises:
无。
"""
image = Image.open(requests.get(request, stream=True).raw)
image.load()
return image.convert(self.image_mode)
def encode_base64(
self,
media: Image.Image,
*,
image_format: str = "JPEG",
) -> str:
"""
将图像转换为Base64编码的字符串。
Args:
media (Image.Image): 待处理的图像对象,支持PIL库中的Image类型。
image_format (str, optional): 指定图像格式,默认为"JPEG"。可选项包括:"PNG", "JPEG", "BMP", "TIFF"等。
PIL库中的所有图片格式都可以使用,但是不建议使用"PPM""XBM"格式,因为这两种格式在Python3中已经被弃用了。
Returns:
str: Base64编码后的字符串,可以直接作为HTML或者JSON数据传输。
Raises:
None
"""
image = media
with BytesIO() as buffer:
image = image.convert(self.image_mode)
image.save(buffer, image_format)
data = buffer.getvalue()
return base64.b64encode(data).decode("utf-8")