Are you sure you want to delete this access key?
TLDR: Getting 10FPS for YoloNAS on your 4090 and feeling cheated? Read this carefully!
YoloNAS is a leading object detection architecture that combines accuracy and efficiency. Using post training quantization (PTQ) and quantization-aware training (QAT) YoloNAS models can be optimized for resource-constrained devices. However, to fully tap into its potential, it is crucial to know how to export the quantized model to the INT8 TensorRT (TRT) engine.
In this tutorial, we emphasize the significance of this step and provide a concise guide to efficiently exporting a quantized YoloNAS model to the INT8 TRT engine. Doing so teaches us how to properly benchmark YoloNAS and understand its full potential.
The first step is to export our YoloNAS model to ONNX correctly. Two actions must be taken before we export our model to onnx:
We must call model.prep_model_for_conversion
- this is essential as YoloNAS incorporates QARepVGG blocks. Without this call, the RepVGG branches will not be fused, and our model's speed will decrease significantly! This is true for the Pytorch model as well as the compiled TRT Engine!
Nothing to worry about if you have quantized your model with PTQ/QAT with SG, as this is done under the hood before exporting the ONNX checkpoints.
We need to replace our layers with "fake quantized" ones - this happens when we perform post-training quantization or quantization-aware training with SG. Again, nothing to worry about if you performed PTQ/QAT with SG and hold your newly exported ONNX checkpoint. Beware that inference time in Pytorch is slower with such blocks - but will be faster once converted to the TRT Engine.
There are plenty of guides on how to perform PTQ/QAT with SG:
Suppose we ran PTQ/QAT, then our PTQ/QAT checkpoints have been exported to our checkpoints directory. If we plug them into netron, we can see that new blocks that were not a part of the original network were introduced: the Quantize/Dequantize layers -
This is expected and an excellent way to verify that our model is ready to be converted to Int8 using Nvidia's TesnorRT. As stated earlier - inference time in Pytorch is slower with such blocks - but will be faster once converted to the TRT Engine.
First, please make sure to install Nvidia's TensorRT.
TensorRT version >= 8.4
is required.
We can now use these ONNX files to deploy our newly trained YoloNAS models to production. When building the TRT engine, it is essential to specify that we convert to Int8 (the fake quantized layers in our models will be adapted accordingly); this can be done by running:
trtexec --fp16 --int8 --avgRuns=100 --onnx=your_yolonas_qat_model.onnx
After running:
trtexec --fp16 --int8 --avgRuns=100 --onnx=your_yolonas_qat_model.onnx
your screen will look somewhat similar to the screenshot below:
Command notes:
--avgRuns=100
which means that this command runs the model 100 times, so that we get more "robust" results that are less affected by noise.Benchmark breakdown:
Press p or to see the previous file or, n or to see the next file
Browsing data directories saved to S3 is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
super-gradients is now integrated with AWS S3!
Are you sure you want to delete this access key?
Browsing data directories saved to Google Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
super-gradients is now integrated with Google Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to Azure Cloud Storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
super-gradients is now integrated with Azure Cloud Storage!
Are you sure you want to delete this access key?
Browsing data directories saved to S3 compatible storage is possible with DAGsHub. Let's configure your repository to easily display your data in the context of any commit!
super-gradients is now integrated with your S3 compatible storage!
Are you sure you want to delete this access key?