Are you sure you want to delete this access key?
comments | description | keywords |
---|---|---|
true | Discover YOLO12, featuring groundbreaking attention-centric architecture for state-of-the-art object detection with unmatched accuracy and efficiency. | YOLO12, attention-centric object detection, YOLO series, Ultralytics, computer vision, AI, machine learning, deep learning |
YOLO12 introduces an attention-centric architecture that departs from the traditional CNN-based approaches used in previous YOLO models, yet retains the real-time inference speed essential for many applications. This model achieves state-of-the-art object detection accuracy through novel methodological innovations in attention mechanisms and overall network architecture, while maintaining real-time performance.
Watch: How to Use YOLO12 for Object Detection with the Ultralytics Package | Is YOLO12 Fast or Slow? 🚀
YOLO12 supports a variety of computer vision tasks. The table below shows task support and the operational modes (Inference, Validation, Training, and Export) enabled for each:
Model Type | Task | Inference | Validation | Training | Export |
---|---|---|---|---|---|
YOLO12 | Detection | ✅ | ✅ | ✅ | ✅ |
YOLO12-seg | Segmentation | ✅ | ✅ | ✅ | ✅ |
YOLO12-pose | Pose | ✅ | ✅ | ✅ | ✅ |
YOLO12-cls | Classification | ✅ | ✅ | ✅ | ✅ |
YOLO12-obb | OBB | ✅ | ✅ | ✅ | ✅ |
YOLO12 demonstrates significant accuracy improvements across all model scales, with some trade-offs in speed compared to the fastest prior YOLO models. Below are quantitative results for object detection on the COCO validation dataset:
!!! tip "Performance"
=== "Detection (COCO)"
| Model | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>T4 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) | Comparison<br><sup>(mAP/Speed) |
| ------------------------------------------------------------------------------------ | --------------------- | -------------------- | ------------------------------ | --------------------------------- | ------------------ | ----------------- | ------------------------------ |
| [YOLO12n](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo12n.pt) | 640 | 40.6 | - | 1.64 | 2.6 | 6.5 | +2.1%/-9% (vs. YOLOv10n) |
| [YOLO12s](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo12s.pt) | 640 | 48.0 | - | 2.61 | 9.3 | 21.4 | +0.1%/+42% (vs. RT-DETRv2) |
| [YOLO12m](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo12m.pt) | 640 | 52.5 | - | 4.86 | 20.2 | 67.5 | +1.0%/-3% (vs. YOLO11m) |
| [YOLO12l](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo12l.pt) | 640 | 53.7 | - | 6.77 | 26.4 | 88.9 | +0.4%/-8% (vs. YOLO11l) |
| [YOLO12x](https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo12x.pt) | 640 | 55.2 | - | 11.79 | 59.1 | 199.0 | +0.6%/-4% (vs. YOLO11x) |
This section provides examples for training and inference with YOLO12. For more comprehensive documentation on these and other modes (including Validation and Export), consult the dedicated Predict and Train pages.
The examples below focus on YOLO12 Detect models (for object detection). For other supported tasks (segmentation, classification, oriented object detection, and pose estimation), refer to the respective task-specific documentation: Segment, Classify, OBB, and Pose.
!!! example
=== "Python"
Pretrained `*.pt` models (using [PyTorch](https://pytorch.org/)) and configuration `*.yaml` files can be passed to the `YOLO()` class to create a model instance in Python:
```python
from ultralytics import YOLO
# Load a COCO-pretrained YOLO12n model
model = YOLO("yolo12n.pt")
# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
# Run inference with the YOLO12n model on the 'bus.jpg' image
results = model("path/to/bus.jpg")
```
=== "CLI"
Command Line Interface (CLI) commands are also available:
```bash
# Load a COCO-pretrained YOLO12n model and train on the COCO8 example dataset for 100 epochs
yolo train model=yolo12n.pt data=coco8.yaml epochs=100 imgsz=640
# Load a COCO-pretrained YOLO12n model and run inference on the 'bus.jpg' image
yolo predict model=yolo12n.pt source=path/to/bus.jpg
```
Enhanced Feature Extraction:
Optimization Innovations:
Architectural Efficiency:
The Ultralytics YOLO12 implementation, by default, does not require FlashAttention. However, FlashAttention can be optionally compiled and used with YOLO12. To compile FlashAttention, one of the following NVIDIA GPUs is needed:
If you use YOLO12 in your research, please cite the original work by University at Buffalo and the University of Chinese Academy of Sciences:
!!! quote ""
=== "BibTeX"
```bibtex
@article{tian2025yolov12,
title={YOLOv12: Attention-Centric Real-Time Object Detectors},
author={Tian, Yunjie and Ye, Qixiang and Doermann, David},
journal={arXiv preprint arXiv:2502.12524},
year={2025}
}
@software{yolo12,
author = {Tian, Yunjie and Ye, Qixiang and Doermann, David},
title = {YOLOv12: Attention-Centric Real-Time Object Detectors},
year = {2025},
url = {https://github.com/sunsmarterjie/yolov12},
license = {AGPL-3.0}
}
```
YOLO12 incorporates several key innovations to balance speed and accuracy. The Area Attention mechanism efficiently processes large receptive fields, reducing computational cost compared to standard self-attention. The Residual Efficient Layer Aggregation Networks (R-ELAN) improve feature aggregation, addressing optimization challenges in larger attention-centric models. Optimized Attention Architecture, including the use of FlashAttention and removal of positional encoding, further enhances efficiency. These features allow YOLO12 to achieve state-of-the-art accuracy while maintaining the real-time inference speed crucial for many applications.
YOLO12 is a versatile model that supports a wide range of core computer vision tasks. It excels in object detection, instance segmentation, image classification, pose estimation, and oriented object detection (OBB) (see details). This comprehensive task support makes YOLO12 a powerful tool for diverse applications, from robotics and autonomous driving to medical imaging and industrial inspection. Each of these tasks can be performed in Inference, Validation, Training, and Export modes.
YOLO12 demonstrates significant accuracy improvements across all model scales compared to prior YOLO models like YOLOv10 and YOLO11, with some trade-offs in speed compared to the fastest prior models. For example, YOLO12n achieves a +2.1% mAP improvement over YOLOv10n and +1.2% over YOLO11n on the COCO val2017 dataset. Compared to models like RT-DETR, YOLO12s offers a +1.5% mAP improvement and a substantial +42% speed increase. These metrics highlight YOLO12's strong balance between accuracy and efficiency. See the performance metrics section for detailed comparisons.
By default, the Ultralytics YOLO12 implementation does not require FlashAttention. However, FlashAttention can be optionally compiled and used with YOLO12 to minimize memory access overhead. To compile FlashAttention, one of the following NVIDIA GPUs is needed: Turing GPUs (e.g., T4, Quadro RTX series), Ampere GPUs (e.g., RTX30 series, A30/40/100), Ada Lovelace GPUs (e.g., RTX40 series), or Hopper GPUs (e.g., H100/H200). This flexibility allows users to leverage FlashAttention's benefits when hardware resources permit.
This page provides basic usage examples for training and inference. For comprehensive documentation on these and other modes, including Validation and Export, consult the dedicated Predict and Train pages. For task-specific information (segmentation, classification, oriented object detection, and pose estimation), refer to the respective documentation: Segment, Classify, OBB, and Pose. These resources provide in-depth guidance for effectively utilizing YOLO12 in various scenarios.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?