Are you sure you want to delete this access key?
From this tutorial you will learn:
A new export API is introduced in SG 3.2.0. It is aimed to simplify the export process and allow end-to-end export of SG models to ONNX format with a single line of code.
!pip install -qq super_gradients==3.4.0
Let start with the most simple example of exporting a model to ONNX format.
We will use YoloNAS-S model in this example. All models that suports new export API now expose a export()
method that can be used to export a model. There is one mandatory argument that should be passed to the export()
method - the path to the output file. Currently, only .onnx
format is supported, but we may add support for CoreML and other formats in the future.
from super_gradients.common.object_names import Models
from super_gradients.training import models
model = models.get(Models.YOLO_NAS_S, pretrained_weights="coco")
export_result = model.export("yolo_nas_s.onnx")
A lot of work just happened under the hood:
A returned value of export()
method is an instance of ModelExportResult
class.
First of all it serves the purpose of storing all the information about the exported model in a single place.
It also provides a convenient way to get an example of running the model and getting the output:
export_result
Model exported successfully to yolo_nas_s.onnx
Model expects input image of shape [1, 3, 640, 640]
Input image dtype is torch.uint8
Exported model already contains preprocessing (normalization) step, so you don't need to do it manually.
Preprocessing steps to be applied to input image are:
Sequential(
(0): CastTensorTo(dtype=torch.float32)
(1): ApplyMeanStd(mean=[0.], scale=[255.])
)
Exported model contains postprocessing (NMS) step with the following parameters:
num_pre_nms_predictions=1000
max_predictions_per_image=1000
nms_threshold=0.7
confidence_threshold=0.25
output_predictions_format=batch
Exported model is in ONNX format and can be used with ONNXRuntime
To run inference with ONNXRuntime, please use the following code snippet:
import onnxruntime
import numpy as np
session = onnxruntime.InferenceSession("yolo_nas_s.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
inputs = [o.name for o in session.get_inputs()]
outputs = [o.name for o in session.get_outputs()]
example_input_image = np.zeros((1, 3, 640, 640)).astype(np.uint8)
predictions = session.run(outputs, {inputs[0]: example_input_image})
Exported model has predictions in batch format:
num_detections, pred_boxes, pred_scores, pred_classes = predictions
for image_index in range(num_detections.shape[0]):
for i in range(num_detections[image_index,0]):
class_id = pred_classes[image_index, i]
confidence = pred_scores[image_index, i]
x_min, y_min, x_max, y_max = pred_boxes[image_index, i]
print(f"Detected object with class_id={class_id}, confidence={confidence}, x_min={x_min}, y_min={y_min}, x_max={x_max}, y_max={y_max}")
That's it. You can now use the exported model with any ONNX-compatible runtime or accelerator.
import cv2
import numpy as np
from super_gradients.training.utils.media.image import load_image
import onnxruntime
image = load_image("https://deci-pretrained-models.s3.amazonaws.com/sample_images/beatles-abbeyroad.jpg")
image = cv2.resize(image, (export_result.input_image_shape[1], export_result.input_image_shape[0]))
image_bchw = np.transpose(np.expand_dims(image, 0), (0, 3, 1, 2))
session = onnxruntime.InferenceSession(export_result.output, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
inputs = [o.name for o in session.get_inputs()]
outputs = [o.name for o in session.get_outputs()]
result = session.run(outputs, {inputs[0]: image_bchw})
result[0].shape, result[1].shape, result[2].shape, result[3].shape
((1, 1), (1, 1000, 4), (1, 1000), (1, 1000))
In the next section we unpack the result of prediction and show how to use it.
If preprocessing=True
(default value) then all models will be exported with NMS. If preprocessing=False
models will be exported without NMS and raw model outputs will be returned. In this case, you will need to apply NMS yourself. This is useful if you want to use a custom NMS implementation that is not ONNX-compatible. In most cases you will want to use default preprocessing=True
. It is also possible to pass a custom nn.Module
as a postprocessing
argument to the export()
method. This module will be attached to the exported ONNX graph instead of the default NMS module. We encourage users to read the documentation of the export()
method to learn more about the advanced options.
When exporting an object detection model with postprocessing enabled, the prediction format can be one of two:
DetectionOutputFormatMode.FLAT_FORMAT
DetectionOutputFormatMode.BATCH_FORMAT
You can select the desired output format by setting export(..., output_predictions_format=DetectionOutputFormatMode.BATCH_FORMAT)
.
A detection results returned as a single tensor of shape [N, 7]
, where N
is the number of detected objects in the entire batch. Each row in the tensor represents a single detection result and has the following format:
[batch_index, x1, y1, x2, y2, class score, class index]
When exporting a model with batch size of 1 (default mode) you can ignore the first column as all boxes will belong to the single sample. In case you export model with batch size > 1 you have to iterate over this array like so:
for sample_index in export_result.batch_size:
detections_for_sample_i = flat_predictions[flat_predictions[:, 0] == sample_index]
for (x1, y1, x2, y2, class_score, class_index) in detections_for_sample_i:
class_index = int(class_index) # convert from float to int
# do something with the detection predictions
A second supported format is so-called "batch". It matches with output format of TensorRT's NMS implementation. The return value in this case is tuple of 4 tensors:
num_predictions
- [B, 1] - A number of predictions per samplepred_boxes
- [B, N, 4] - A coordinates of the predicted boxes in X1, Y1, X2, Y2 formatpred_scores
- [B, N] - A scores of the predicted boxespred_classes
- [B, N] - A class indices of the predicted boxesHere B
corresponds to batch size and N
is the maximum number of detected objects per image.
In order to get the actual number of detections per image you need to iterate over num_predictions
tensor and get the first element of each row.
Now when you're familiar with the output formats, let's see how to use them. To start, it's useful to take a look at the values of the predictions with a naked eye:
num_predictions, pred_boxes, pred_scores, pred_classes = result
num_predictions
array([[25]], dtype=int64)
np.set_printoptions(threshold=50, edgeitems=3)
pred_boxes, pred_boxes.shape
(array([[[439.55383, 253.22733, 577.5956 , 548.11975],
[ 35.71795, 249.40926, 176.62216, 544.69794],
[182.39618, 249.49301, 301.44122, 529.3324 ],
...,
[ -1. , -1. , -1. , -1. ],
[ -1. , -1. , -1. , -1. ],
[ -1. , -1. , -1. , -1. ]]], dtype=float32),
(1, 1000, 4))
np.set_printoptions(threshold=50, edgeitems=5)
pred_scores, pred_scores.shape
(array([[ 0.9694027, 0.9693378, 0.9665707, 0.9619047, 0.7538769, ...,
-1. , -1. , -1. , -1. , -1. ]],
dtype=float32),
(1, 1000))
np.set_printoptions(threshold=50, edgeitems=10)
pred_classes, pred_classes.shape
(array([[ 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, ..., -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1]], dtype=int64),
(1, 1000))
For sake of this tutorial we will use a simple visualization function that is tailored for batch_size=1 only. You can use it as a starting point for your own visualization code.
from super_gradients.training.datasets.datasets_conf import COCO_DETECTION_CLASSES_LIST
from super_gradients.training.utils.detection_utils import DetectionVisualization
import matplotlib.pyplot as plt
def show_predictions_from_batch_format(image, predictions):
num_predictions, pred_boxes, pred_scores, pred_classes = predictions
assert num_predictions.shape[0] == 1, "Only batch size of 1 is supported by this function"
num_predictions = int(num_predictions.item())
pred_boxes = pred_boxes[0, :num_predictions]
pred_scores = pred_scores[0, :num_predictions]
pred_classes = pred_classes[0, :num_predictions]
image = image.copy()
class_names = COCO_DETECTION_CLASSES_LIST
color_mapping = DetectionVisualization._generate_color_mapping(len(class_names))
for (x1, y1, x2, y2, class_score, class_index) in zip(pred_boxes[:, 0], pred_boxes[:, 1], pred_boxes[:, 2], pred_boxes[:, 3], pred_scores, pred_classes):
image = DetectionVisualization.draw_box_title(
image_np=image,
x1=int(x1),
y1=int(y1),
x2=int(x2),
y2=int(y2),
class_id=class_index,
class_names=class_names,
color_mapping=color_mapping,
box_thickness=2,
pred_conf=class_score,
)
plt.figure(figsize=(8, 8))
plt.imshow(image)
plt.tight_layout()
plt.show()
show_predictions_from_batch_format(image, result)
You can explicitly specify output format of the predictions by setting the output_predictions_format
argument of export()
method. Let's see how it works:
from super_gradients.conversion import DetectionOutputFormatMode
export_result = model.export("yolo_nas_s.onnx", output_predictions_format=DetectionOutputFormatMode.FLAT_FORMAT)
export_result
Model exported successfully to yolo_nas_s.onnx
Model expects input image of shape [1, 3, 640, 640]
Input image dtype is torch.uint8
Exported model already contains preprocessing (normalization) step, so you don't need to do it manually.
Preprocessing steps to be applied to input image are:
Sequential(
(0): CastTensorTo(dtype=torch.float32)
(1): ApplyMeanStd(mean=[0.], scale=[255.])
)
Exported model contains postprocessing (NMS) step with the following parameters:
num_pre_nms_predictions=1000
max_predictions_per_image=1000
nms_threshold=0.7
confidence_threshold=0.25
output_predictions_format=flat
Exported model is in ONNX format and can be used with ONNXRuntime
To run inference with ONNXRuntime, please use the following code snippet:
import onnxruntime
import numpy as np
session = onnxruntime.InferenceSession("yolo_nas_s.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
inputs = [o.name for o in session.get_inputs()]
outputs = [o.name for o in session.get_outputs()]
example_input_image = np.zeros((1, 3, 640, 640)).astype(np.uint8)
predictions = session.run(outputs, {inputs[0]: example_input_image})
Exported model has predictions in flat format:
# flat_predictions is a 2D array of [N,7] shape
# Each row represents (image_index, x_min, y_min, x_max, y_max, confidence, class_id)
# Please note all values are floats, so you have to convert them to integers if needed
[flat_predictions] = predictions
for (_, x_min, y_min, x_max, y_max, confidence, class_id) in flat_predictions[0]:
class_id = int(class_id)
print(f"Detected object with class_id={class_id}, confidence={confidence}, x_min={x_min}, y_min={y_min}, x_max={x_max}, y_max={y_max}")
Now we exported a model that produces predictions in flat
format. Let's run the model like before and see the result:
session = onnxruntime.InferenceSession(export_result.output, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
inputs = [o.name for o in session.get_inputs()]
outputs = [o.name for o in session.get_outputs()]
result = session.run(outputs, {inputs[0]: image_bchw})
result[0].shape
(25, 7)
def show_predictions_from_flat_format(image, predictions):
[flat_predictions] = predictions
image = image.copy()
class_names = COCO_DETECTION_CLASSES_LIST
color_mapping = DetectionVisualization._generate_color_mapping(len(class_names))
for (sample_index, x1, y1, x2, y2, class_score, class_index) in flat_predictions[flat_predictions[:, 0] == 0]:
class_index = int(class_index)
image = DetectionVisualization.draw_box_title(
image_np=image,
x1=int(x1),
y1=int(y1),
x2=int(x2),
y2=int(y2),
class_id=class_index,
class_names=class_names,
color_mapping=color_mapping,
box_thickness=2,
pred_conf=class_score,
)
plt.figure(figsize=(8, 8))
plt.imshow(image)
plt.tight_layout()
plt.show()
show_predictions_from_flat_format(image, result)
You can control a number of parameters in the NMS settings as well as maximum number of detections per image before and after NMS step:
nms_iou_threshold
nms_score_threshold
max_detections_before_nms
max_detections_after_nms
For sake of demonstration, let's export a model that would produce at most one detection per image with confidence threshold above 0.8 and NMS IOU threshold of 0.5. Let's use at most 100 predictions per image before NMS step:
export_result = model.export(
"yolo_nas_s_top_1.onnx",
confidence_threshold = 0.8,
nms_threshold = 0.5,
num_pre_nms_predictions = 100,
max_predictions_per_image = 1,
output_predictions_format = DetectionOutputFormatMode.FLAT_FORMAT
)
session = onnxruntime.InferenceSession(export_result.output, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
inputs = [o.name for o in session.get_inputs()]
outputs = [o.name for o in session.get_outputs()]
result = session.run(outputs, {inputs[0]: image_bchw})
show_predictions_from_flat_format(image, result)
You can export a model with quantization to FP16 or INT8. To do so, you need to specify the quantization_mode
argument of export()
method.
Important notes:
Let's see how it works:
from super_gradients.conversion.conversion_enums import ExportQuantizationMode
export_result = model.export(
"yolo_nas_s_int8.onnx",
output_predictions_format=DetectionOutputFormatMode.FLAT_FORMAT,
quantization_mode=ExportQuantizationMode.INT8 # or ExportQuantizationMode.FP16
)
session = onnxruntime.InferenceSession(export_result.output, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
inputs = [o.name for o in session.get_inputs()]
outputs = [o.name for o in session.get_outputs()]
result = session.run(outputs, {inputs[0]: image_bchw})
show_predictions_from_flat_format(image, result)
When quantizing a model using quantization_mode==ExportQuantizationMode.INT8
you can pass a DataLoader to export() function to collect correct statistics of activations to prodice a more accurate quantized model.
We expect the DataLoader to return either a tuple of tensors or a single tensor. In case a tuple of tensors is returned by data-loader the first element will be used as input image.
You can use existing data-loaders from SG here as is.
Important notes
calibration_loader
should use same image normalization parameters that were used during training.In the example below we use a dummy data-loader for sake of showing how to use this feature. You should use your own data-loader here.
import torch
from torch.utils.data import DataLoader
from super_gradients.conversion import ExportQuantizationMode
# THIS IS ONLY AN EXAMPLE. YOU SHOULD USE YOUR OWN DATA-LOADER HERE
dummy_calibration_dataset = [torch.randn((3, 640, 640), dtype=torch.float32) for _ in range(32)]
dummy_calibration_loader = DataLoader(dummy_calibration_dataset, batch_size=8, num_workers=0)
# THIS IS ONLY AN EXAMPLE. YOU SHOULD USE YOUR OWN DATA-LOADER HERE
export_result = model.export(
"yolo_nas_s_int8_with_calibration.onnx",
output_predictions_format=DetectionOutputFormatMode.FLAT_FORMAT,
quantization_mode=ExportQuantizationMode.INT8,
calibration_loader=dummy_calibration_loader
)
session = onnxruntime.InferenceSession(export_result.output, providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
inputs = [o.name for o in session.get_inputs()]
outputs = [o.name for o in session.get_outputs()]
result = session.run(outputs, {inputs[0]: image_bchw})
show_predictions_from_flat_format(image, result)
25%|█████████████████████████████████ | 4/16 [00:11<00:34, 2.91s/it]
Currently, we support two backends for exporting models:
The only difference between these two backends is what NMS implementation will be used. ONNX Runtime uses NMS implementation from ONNX opset, while TensorRT uses its own NMS implementation which is expected to be faster.
A disadvantage of TensorRT backend is that you cannot run model exported for TensorRT backend by ONNX Runtime. You can, however, run models exported for ONNX Runtime backend inside TensorRT.
Therefore, ONNX Runtime backend is recommended for most use-cases and is used by default.
You can specify the desired execution backend by setting the execution_backend
argument of export()
method:
from super_gradients.conversion import ExportTargetBackend
model.export(..., engine=ExportTargetBackend.ONNXRUNTIME)
from super_gradients.conversion import ExportTargetBackend
model.export(..., engine=ExportTargetBackend.TENSORRT)
The .export() API is a new high-level API that is recommended for most use-cases. However old low-level API is still available for advanced users:
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?