[Quantization] Update quantized model deployment examples and update readme. (#377)

* Add PaddleOCR Support * Add PaddleOCR Support * Add PaddleOCRv3 Support * Add PaddleOCRv3 Support * Update README.md * Update README.md * Update README.md * Update README.md * Add PaddleOCRv3 Support * Add PaddleOCRv3 Supports * Add PaddleOCRv3 Suport * Fix Rec diff * Remove useless functions * Remove useless comments * Add PaddleOCRv2 Support * Add PaddleOCRv3 & PaddleOCRv2 Support * remove useless parameters * Add utils of sorting det boxes * Fix code naming convention * Fix code naming convention * Fix code naming convention * Fix bug in the Classify process * Imporve OCR Readme * Fix diff in Cls model * Update Model Download Link in Readme * Fix diff in PPOCRv2 * Improve OCR readme * Imporve OCR readme * Improve OCR readme * Improve OCR readme * Imporve OCR readme * Improve OCR readme * Fix conflict * Add readme for OCRResult * Improve OCR readme * Add OCRResult readme * Improve OCR readme * Improve OCR readme * Add Model Quantization Demo * Fix Model Quantization Readme * Fix Model Quantization Readme * Add the function to do PTQ quantization * Improve quant tools readme * Improve quant tool readme * Improve quant tool readme * Add PaddleInference-GPU for OCR Rec model * Add QAT method to fastdeploy-quantization tool * Remove examples/slim for now * Move configs folder * Add Quantization Support for Classification Model * Imporve ways of importing preprocess * Upload YOLO Benchmark on readme * Upload YOLO Benchmark on readme * Upload YOLO Benchmark on readme * Improve Quantization configs and readme * Add support for multi-inputs model * Add backends and params file for YOLOv7 * Add quantized model deployment support for YOLO series * Fix YOLOv5 quantize readme * Fix YOLO quantize readme * Fix YOLO quantize readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Improve quantize YOLO readme * Fix bug, change Fronted to ModelFormat * Change Fronted to ModelFormat * Add examples to deploy quantized paddleclas models * Fix readme * Add quantize Readme * Add quantize Readme * Add quantize Readme * Modify readme of quantization tools * Modify readme of quantization tools * Improve quantization tools readme * Improve quantization readme * Improve PaddleClas quantized model deployment readme * Add PPYOLOE-l quantized deployment examples * Improve quantization tools readme * Improve Quantize Readme * Fix conflicts * Fix conflicts * improve readme * Improve quantization tools and readme * Improve quantization tools and readme * Add quantized deployment examples for PaddleSeg model * Fix cpp readme * Fix memory leak of reader_wrapper function * Fix model file name in PaddleClas quantization examples * Update Runtime and E2E benchmark * Update Runtime and E2E benchmark * Rename quantization tools to auto compression tools * Remove PPYOLOE data when deployed on MKLDNN * Fix readme * Support PPYOLOE with OR without NMS and update readme * Update Readme * Update configs and readme * Update configs and readme * Add Paddle-TensorRT backend in quantized model deploy examples * Support PPYOLOE+ series
2026-04-24 17:49:42 +08:00 · 2022-11-02 20:29:29 +08:00
parent 9437dec9f5
commit a231c9e7f3
53 changed files with 1514 additions and 521 deletions
@@ -0,0 +1,28 @@
+# PaddleSeg 量化模型 Python部署示例
+本目录下提供的`infer.py`,可以帮助用户快速完成PaddleSeg量化模型在CPU/GPU上的部署推理加速.
+
+## 部署准备
+### FastDeploy环境准备
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)  
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
+
+### 量化模型准备
+- 1. 用户可以直接使用由FastDeploy提供的量化模型进行部署.
+- 2. 用户可以使用FastDeploy提供的[一键模型自动化压缩工具](../../tools/auto_compression/),自行进行模型量化, 并使用产出的量化模型进行部署.(注意: 推理量化后的分类模型仍然需要FP32模型文件夹下的deploy.yaml文件, 自行量化的模型文件夹内不包含此yaml文件, 用户从FP32模型文件夹下复制此yaml文件到量化后的模型文件夹内即可.)
+
+
+## 以量化后的PP_LiteSeg_T_STDC1_cityscapes模型为例, 进行部署
+```bash
+#下载部署示例代码
+git clone https://github.com/PaddlePaddle/FastDeploy.git
+cd examples/vision/segmentation/paddleseg/quantize/python
+
+#下载FastDeloy提供的PP_LiteSeg_T_STDC1_cityscapes量化模型文件和测试图片
+wget https://bj.bcebos.com/paddlehub/fastdeploy/PP_LiteSeg_T_STDC1_cityscapes_without_argmax_infer_PTQ.tar
+tar -xvf PP_LiteSeg_T_STDC1_cityscapes_without_argmax_infer_PTQ.tar
+wget https://paddleseg.bj.bcebos.com/dygraph/demo/cityscapes_demo.png
+
+# 在CPU上使用Paddle-Inference推理量化模型
+python infer.py --model PP_LiteSeg_T_STDC1_cityscapes_without_argmax_infer_QAT --image cityscapes_demo.png --device cpu --backend paddle
+
+```
@@ -0,0 +1,76 @@
+import fastdeploy as fd
+import cv2
+import os
+
+
+def parse_arguments():
+    import argparse
+    import ast
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="Path of PaddleSeg model.")
+    parser.add_argument(
+        "--image", required=True, help="Path of test image file.")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Type of inference device, support 'cpu' or 'gpu'.")
+    parser.add_argument(
+        "--backend",
+        type=str,
+        default="default",
+        help="Type of inference backend, support ort/trt/paddle/openvino, default 'openvino' for cpu, 'tensorrt' for gpu"
+    )
+    parser.add_argument(
+        "--device_id",
+        type=int,
+        default=0,
+        help="Define which GPU card used to run model.")
+    parser.add_argument(
+        "--cpu_thread_num",
+        type=int,
+        default=9,
+        help="Number of threads while inference on CPU.")
+    return parser.parse_args()
+
+
+def build_option(args):
+    option = fd.RuntimeOption()
+    if args.device.lower() == "gpu":
+        option.use_gpu(0)
+
+    option.set_cpu_thread_num(args.cpu_thread_num)
+
+    if args.backend.lower() == "trt":
+        assert args.device.lower(
+        ) == "gpu", "TensorRT backend require inferences on device GPU."
+        option.use_trt_backend()
+        option.set_trt_cache_file(os.path.join(args.model, "model.trt"))
+        option.set_trt_input_shape("x", [1, 3, 256, 256], [1, 3, 1024, 1024],
+                                   [1, 3, 2048, 2048])
+    elif args.backend.lower() == "ort":
+        option.use_ort_backend()
+    elif args.backend.lower() == "paddle":
+        option.use_paddle_backend()
+    elif args.backend.lower() == "openvino":
+        assert args.device.lower(
+        ) == "cpu", "OpenVINO backend require inference on device CPU."
+        option.use_openvino_backend()
+    return option
+
+
+args = parse_arguments()
+
+# 配置runtime，加载模型
+runtime_option = build_option(args)
+model_file = os.path.join(args.model, "model.pdmodel")
+params_file = os.path.join(args.model, "model.pdiparams")
+config_file = os.path.join(args.model, "deploy.yaml")
+model = fd.vision.segmentation.PaddleSegModel(
+    model_file, params_file, config_file, runtime_option=runtime_option)
+
+# 预测图片检测结果
+im = cv2.imread(args.image)
+result = model.predict(im.copy())
+print(result)