Are you sure you want to delete this access key?
comments | description | keywords |
---|---|---|
true | Discover SAM 2, the next generation of Meta's Segment Anything Model, supporting real-time promptable segmentation in both images and videos with state-of-the-art performance. Learn about its key features, datasets, and how to use it. | SAM 2, SAM 2.1, Segment Anything, video segmentation, image segmentation, promptable segmentation, zero-shot performance, SA-V dataset, Ultralytics, real-time segmentation, AI, machine learning |
!!! tip "SAM 2.1"
We have just added support for the more accurate SAM2.1 model. Please give it a try!
SAM 2, the successor to Meta's Segment Anything Model (SAM), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization.
Watch: How to Run Inference with Meta's SAM2 using Ultralytics | Step-by-Step Guide 🎉
SAM 2 combines the capabilities of image and video segmentation in a single model. This unification simplifies deployment and allows for consistent performance across different media types. It leverages a flexible prompt-based interface, enabling users to specify objects of interest through various prompt types, such as points, bounding boxes, or masks.
The model achieves real-time inference speeds, processing approximately 44 frames per second. This makes SAM 2 suitable for applications requiring immediate feedback, such as video editing and augmented reality.
SAM 2 can segment objects it has never encountered before, demonstrating strong zero-shot generalization. This is particularly useful in diverse or evolving visual domains where pre-defined categories may not cover all possible objects.
Users can iteratively refine the segmentation results by providing additional prompts, allowing for precise control over the output. This interactivity is essential for fine-tuning results in applications like video annotation or medical imaging.
SAM 2 includes mechanisms to manage common video segmentation challenges, such as object occlusion and reappearance. It uses a sophisticated memory mechanism to keep track of objects across frames, ensuring continuity even when objects are temporarily obscured or exit and re-enter the scene.
For a deeper understanding of SAM 2's architecture and capabilities, explore the SAM 2 research paper.
SAM 2 sets a new benchmark in the field, outperforming previous models on various metrics:
Metric | SAM 2 | Previous SOTA |
---|---|---|
Interactive Video Segmentation | Best | - |
Human Interactions Required | 3x fewer | Baseline |
Image Segmentation Accuracy | Improved | SAM |
Inference Speed | 6x faster | SAM |
The memory mechanism allows SAM 2 to handle temporal dependencies and occlusions in video data. As objects move and interact, SAM 2 records their features in a memory bank. When an object becomes occluded, the model can rely on this memory to predict its position and appearance when it reappears. The occlusion head specifically handles scenarios where objects are not visible, predicting the likelihood of an object being occluded.
In situations with ambiguity (e.g., overlapping objects), SAM 2 can generate multiple mask predictions. This feature is crucial for accurately representing complex scenes where a single mask might not sufficiently describe the scene's nuances.
The SA-V dataset, developed for SAM 2's training, is one of the largest and most diverse video segmentation datasets available. It includes:
SAM 2 has demonstrated superior performance across major video segmentation benchmarks:
Dataset | J&F | J | F |
---|---|---|---|
DAVIS 2017 | 82.5 | 79.8 | 85.2 |
YouTube-VOS | 81.2 | 78.9 | 83.5 |
In interactive segmentation tasks, SAM 2 shows significant efficiency and accuracy:
Dataset | NoC@90 | AUC |
---|---|---|
DAVIS Interactive | 1.54 | 0.872 |
To install SAM 2, use the following command. All SAM 2 models will automatically download on first use.
pip install ultralytics
The following table details the available SAM 2 models, their pre-trained weights, supported tasks, and compatibility with different operating modes like Inference, Validation, Training, and Export.
Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
---|---|---|---|---|---|---|
SAM 2 tiny | sam2_t.pt | Instance Segmentation | ✅ | ❌ | ❌ | ❌ |
SAM 2 small | sam2_s.pt | Instance Segmentation | ✅ | ❌ | ❌ | ❌ |
SAM 2 base | sam2_b.pt | Instance Segmentation | ✅ | ❌ | ❌ | ❌ |
SAM 2 large | sam2_l.pt | Instance Segmentation | ✅ | ❌ | ❌ | ❌ |
SAM 2.1 tiny | sam2.1_t.pt | Instance Segmentation | ✅ | ❌ | ❌ | ❌ |
SAM 2.1 small | sam2.1_s.pt | Instance Segmentation | ✅ | ❌ | ❌ | ❌ |
SAM 2.1 base | sam2.1_b.pt | Instance Segmentation | ✅ | ❌ | ❌ | ❌ |
SAM 2.1 large | sam2.1_l.pt | Instance Segmentation | ✅ | ❌ | ❌ | ❌ |
SAM 2 can be utilized across a broad spectrum of tasks, including real-time video editing, medical imaging, and autonomous systems. Its ability to segment both static and dynamic visual data makes it a versatile tool for researchers and developers.
!!! example "Segment with Prompts"
Use prompts to segment specific objects in images or videos.
=== "Python"
```python
from ultralytics import SAM
# Load a model
model = SAM("sam2.1_b.pt")
# Display model information (optional)
model.info()
# Run inference with bboxes prompt
results = model("path/to/image.jpg", bboxes=[100, 100, 200, 200])
# Run inference with single point
results = model(points=[900, 370], labels=[1])
# Run inference with multiple points
results = model(points=[[400, 370], [900, 370]], labels=[1, 1])
# Run inference with multiple points prompt per object
results = model(points=[[[400, 370], [900, 370]]], labels=[[1, 1]])
# Run inference with negative points prompt
results = model(points=[[[400, 370], [900, 370]]], labels=[[1, 0]])
```
!!! example "Segment Everything"
Segment the entire image or video content without specific prompts.
=== "Python"
```python
from ultralytics import SAM
# Load a model
model = SAM("sam2.1_b.pt")
# Display model information (optional)
model.info()
# Run inference
model("path/to/video.mp4")
```
=== "CLI"
```bash
# Run inference with a SAM 2 model
yolo predict model=sam2.1_b.pt source=path/to/video.mp4
```
!!! example "Segment Video"
Segment the entire video content with specific prompts and track objects.
=== "Python"
```python
from ultralytics.models.sam import SAM2VideoPredictor
# Create SAM2VideoPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="sam2_b.pt")
predictor = SAM2VideoPredictor(overrides=overrides)
# Run inference with single point
results = predictor(source="test.mp4", points=[920, 470], labels=[1])
# Run inference with multiple points
results = predictor(source="test.mp4", points=[[920, 470], [909, 138]], labels=[1, 1])
# Run inference with multiple points prompt per object
results = predictor(source="test.mp4", points=[[[920, 470], [909, 138]]], labels=[[1, 1]])
# Run inference with negative points prompt
results = predictor(source="test.mp4", points=[[[920, 470], [909, 138]]], labels=[[1, 0]])
```
Here we compare Meta's SAM 2 models, including the smallest SAM2-t variant, with Ultralytics smallest segmentation model, YOLO11n-seg:
Model | Size (MB) |
Parameters (M) |
Speed (CPU) (ms/im) |
---|---|---|---|
Meta SAM-b | 375 | 93.7 | 49401 |
Meta SAM2-b | 162 | 80.8 | 31901 |
Meta SAM2-t | 78.1 | 38.9 | 25997 |
MobileSAM | 40.7 | 10.1 | 25381 |
FastSAM-s with YOLOv8 backbone | 23.7 | 11.8 | 55.9 |
Ultralytics YOLOv8n-seg | 6.7 (11.7x smaller) | 3.4 (11.4x less) | 24.5 (1061x faster) |
Ultralytics YOLO11n-seg | 5.9 (13.2x smaller) | 2.9 (13.4x less) | 30.1 (864x faster) |
This comparison demonstrates the substantial differences in model sizes and speeds between SAM variants and YOLO segmentation models. While SAM provides unique automatic segmentation capabilities, YOLO models, particularly YOLOv8n-seg and YOLO11n-seg, are significantly smaller, faster, and more computationally efficient.
Tests run on a 2025 Apple M4 Pro with 24GB of RAM using torch==2.6.0
and ultralytics==8.3.90
. To reproduce this test:
!!! example
=== "Python"
```python
from ultralytics import ASSETS, SAM, YOLO, FastSAM
# Profile SAM2-t, SAM2-b, SAM-b, MobileSAM
for file in ["sam_b.pt", "sam2_b.pt", "sam2_t.pt", "mobile_sam.pt"]:
model = SAM(file)
model.info()
model(ASSETS)
# Profile FastSAM-s
model = FastSAM("FastSAM-s.pt")
model.info()
model(ASSETS)
# Profile YOLO models
for file_name in ["yolov8n-seg.pt", "yolo11n-seg.pt"]:
model = YOLO(file_name)
model.info()
model(ASSETS)
```
Auto-annotation is a powerful feature of SAM 2, enabling users to generate segmentation datasets quickly and accurately by leveraging pre-trained models. This capability is particularly useful for creating large, high-quality datasets without extensive manual effort.
Watch: Auto Annotation with Meta's Segment Anything 2 Model using Ultralytics | Data Labeling
To auto-annotate your dataset using SAM 2, follow this example:
!!! example "Auto-Annotation Example"
```python
from ultralytics.data.annotator import auto_annotate
auto_annotate(data="path/to/images", det_model="yolo11x.pt", sam_model="sam2_b.pt")
```
{% include "macros/sam-auto-annotate.md" %}
This function facilitates the rapid creation of high-quality segmentation datasets, ideal for researchers and developers aiming to accelerate their projects.
Despite its strengths, SAM 2 has certain limitations:
If SAM 2 is a crucial part of your research or development work, please cite it using the following reference:
!!! quote ""
=== "BibTeX"
```bibtex
@article{ravi2024sam2,
title={SAM 2: Segment Anything in Images and Videos},
author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
journal={arXiv preprint},
year={2024}
}
```
We extend our gratitude to Meta AI for their contributions to the AI community with this groundbreaking model and dataset.
SAM 2, the successor to Meta's Segment Anything Model (SAM), is a cutting-edge tool designed for comprehensive object segmentation in both images and videos. It excels in handling complex visual data through a unified, promptable model architecture that supports real-time processing and zero-shot generalization. SAM 2 offers several improvements over the original SAM, including:
For more details on SAM 2's architecture and capabilities, explore the SAM 2 research paper.
SAM 2 can be utilized for real-time video segmentation by leveraging its promptable interface and real-time inference capabilities. Here's a basic example:
!!! example "Segment with Prompts"
Use prompts to segment specific objects in images or videos.
=== "Python"
```python
from ultralytics import SAM
# Load a model
model = SAM("sam2_b.pt")
# Display model information (optional)
model.info()
# Segment with bounding box prompt
results = model("path/to/image.jpg", bboxes=[100, 100, 200, 200])
# Segment with point prompt
results = model("path/to/image.jpg", points=[150, 150], labels=[1])
```
For more comprehensive usage, refer to the How to Use SAM 2 section.
SAM 2 is trained on the SA-V dataset, one of the largest and most diverse video segmentation datasets available. The SA-V dataset includes:
This extensive dataset allows SAM 2 to achieve superior performance across major video segmentation benchmarks and enhances its zero-shot generalization capabilities. For more information, see the SA-V Dataset section.
SAM 2 includes a sophisticated memory mechanism to manage temporal dependencies and occlusions in video data. The memory mechanism consists of:
This mechanism ensures continuity even when objects are temporarily obscured or exit and re-enter the scene. For more details, refer to the Memory Mechanism and Occlusion Handling section.
SAM 2 models, such as Meta's SAM2-t and SAM2-b, offer powerful zero-shot segmentation capabilities but are significantly larger and slower compared to YOLO11 models. For instance, YOLO11n-seg is approximately 13 times smaller and over 860 times faster than SAM2-b. While SAM 2 excels in versatile, prompt-based, and zero-shot segmentation scenarios, YOLO11 is optimized for speed, efficiency, and real-time applications, making it better suited for deployment in resource-constrained environments.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?