YOLO-NAS: A New SOTA Model For Object Detection By Deci
  Back to blog home

YOLO-NAS: A New SOTA Model For Object Detection By Deci

Object Detection May 03, 2023

A new object detection model is in town, and it claims the crown - meet YOLO-NAS.

The researchers at Deci have launched (today!) a new Object Detection model, YOLO-NAS, that outperforms SOTA object detection models (yes, we’re looking at you YOLOv8) in detecting small objects, localization accuracy, and performance-per-compute ratio. Join us as we explore the exciting possibilities that YOLO-NAS opens up for the field of computer vision and beyond.

YOLO-NAS Compared to SOTA Object Detection Models
YOLO-NAS Compared to SOTA Object Detection Models

What is YOLO-NAS?

YOLO-NAS is a new object detection model developed by the researchers at Deci, with a little bit of help from Automated AI. The researchers utilized neural architecture search (NAS) and Automated Neural Architecture Construction (AutoNAC) engine, to explore and test new model architectures in an automated way using deep learning algorithms.

The research team began with an enormous search space of 10^14 potential architectures. Using the AutoNAC engine, they navigated through this space and identified the "efficiency frontier," an area where the algorithm can balance latency and throughput. The entire search process took the team 3800 GPU hours!

YOLO-NAS is available under an open-source license with pre-trained weights available for non-commercial use on SuperGradients, Deci's PyTorch-based, open-source, computer vision training library. With SuperGradients, users can train models from scratch or fine-tune existing ones, leveraging advanced built-in training techniques like Distributed Data Parallel, Exponential Moving Average, Automatic mixed precision, and Quantization Aware Training.

What is unique about the YOLO-NAS architecture?

YOLO-NAS's architecture employs quantization-aware blocks and selective quantization for optimized performance. The model's design features adaptive quantization, skipping quantization in specific layers based on the balance between latency/throughput improvement and accuracy loss. When converted to its INT8 quantized version, YOLO-NAS experiences a smaller precision drop (0.51, 0.65, and 0.45 points of mAP for S, M, and L variants) compared to other models that lose 1-2 mAP points during quantization. These techniques culminate in innovative architecture with superior object detection capabilities and top-notch performance. The YOLO-NAS architecture and pre-trained weights define a new frontier in low-latency inference and an excellent starting point for fine-tuning downstream tasks.

How was YOLO-NAS trained?

YOLO-NAS undergoes a multi-phase training process that involves pre-training on the Objects365 dataset, utilizing the COCO dataset to generate pseudo-labeled data, and incorporating Knowledge Distillation (KD) and Distribution Focal Loss (DFL) techniques.

The pre-training on Objects365, which consists of 2 million images and 365 categories, takes 25-40 epochs on 8 NVIDIA RTX A5000 GPUs. The COCO dataset provides an additional 123k unlabeled images, which are used to generate pseudo-labeled data for training the model.

The KD technique is applied by adding a term to the loss function, allowing the student network to mimic both classification and DFL predictions of the teacher network. Meanwhile, DFL is used by discretizing box predictions into finite values, learning box regression as a classification task, and predicting distributions over these values, which are then converted to final predictions through a weighted sum.

These training methods enable YOLO-NAS to achieve high accuracy and superior object detection capabilities.

How good is YOLO-NAS?

In terms of pure numbers, YOLO-NAS is ~0.5 mAP point more accurate and 10-20% faster than equivalent variants of YOLOv8 and YOLOv7.

YOLO-NAS Compared to SOTA Object Detection Models

Conclusions

Overall, the YOLO-NAS model is an excellent choice for researchers and developers seeking an efficient architecture with state-of-the-art object detection capabilities, achieving optimized performance while maintaining high accuracy during quantization.

Tags

Nir Barazida

MLOps Team Lead @ DagsHub

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.