Are you sure you want to delete this access key?
comments | description | keywords |
---|---|---|
true | Discover how to enhance Ultralytics YOLO model performance using Intel's OpenVINO toolkit. Boost latency and throughput efficiently. | Ultralytics YOLO, OpenVINO optimization, deep learning, model inference, throughput optimization, latency optimization, AI deployment, Intel's OpenVINO, performance tuning |
When deploying deep learning models, particularly those for object detection such as Ultralytics YOLO models, achieving optimal performance is crucial. This guide delves into leveraging Intel's OpenVINO toolkit to optimize inference, focusing on latency and throughput. Whether you're working on consumer-grade applications or large-scale deployments, understanding and applying these optimization strategies will ensure your models run efficiently on various devices.
Latency optimization is vital for applications requiring immediate response from a single model given a single input, typical in consumer scenarios. The goal is to minimize the delay between input and inference result. However, achieving low latency involves careful consideration, especially when running concurrent inferences or managing multiple models.
ov::hint::PerformanceMode::LATENCY
for the ov::hint::performance_mode
property during model compilation simplifies performance tuning, offering a device-agnostic and future-proof approach.ov::enable_mmap(false)
to switch back to reading.Throughput optimization is crucial for scenarios serving numerous inference requests simultaneously, maximizing resource utilization without significantly sacrificing individual request performance.
OpenVINO Performance Hints: A high-level, future-proof method to enhance throughput across devices using performance hints.
import openvino.properties.hint as hints
config = {hints.performance_mode: hints.PerformanceMode.THROUGHPUT}
compiled_model = core.compile_model(model, "GPU", config)
Explicit Batching and Streams: A more granular approach involving explicit batching and the use of streams for advanced performance tuning.
To maximize throughput, applications should:
OpenVINO's multi-device mode simplifies scaling throughput by automatically balancing inference requests across devices without requiring application-level device management.
Implementing OpenVINO optimizations with Ultralytics YOLO models can yield significant performance improvements. As demonstrated in benchmarks, users can experience up to 3x faster inference speeds on Intel CPUs, with even greater accelerations possible across Intel's hardware spectrum including integrated GPUs, dedicated GPUs, and VPUs.
For example, when running YOLOv8 models on Intel Xeon CPUs, the OpenVINO-optimized versions consistently outperform their PyTorch counterparts in terms of inference time per image, without compromising on accuracy.
To export and optimize your Ultralytics YOLO model for OpenVINO, you can use the export functionality:
from ultralytics import YOLO
# Load a model
model = YOLO("yolov8n.pt")
# Export the model to OpenVINO format
model.export(format="openvino", half=True) # Export with FP16 precision
After exporting, you can run inference with the optimized model:
# Load the OpenVINO model
ov_model = YOLO("yolov8n_openvino_model/")
# Run inference with performance hints for latency
results = ov_model("path/to/image.jpg", verbose=True)
Optimizing Ultralytics YOLO models for latency and throughput with OpenVINO can significantly enhance your application's performance. By carefully applying the strategies outlined in this guide, developers can ensure their models run efficiently, meeting the demands of various deployment scenarios. Remember, the choice between optimizing for latency or throughput depends on your specific application needs and the characteristics of the deployment environment.
For more detailed technical information and the latest updates, refer to the OpenVINO documentation and Ultralytics YOLO repository. These resources provide in-depth guides, tutorials, and community support to help you get the most out of your deep learning models.
Ensuring your models achieve optimal performance is not just about tweaking configurations; it's about understanding your application's needs and making informed decisions. Whether you're optimizing for real-time responses or maximizing throughput for large-scale processing, the combination of Ultralytics YOLO models and OpenVINO offers a powerful toolkit for developers to deploy high-performance AI solutions.
Optimizing Ultralytics YOLO models for low latency involves several key strategies:
ov::hint::PerformanceMode::LATENCY
during model compilation for simplified, device-agnostic tuning.For more practical tips on optimizing latency, check out the Latency Optimization section of our guide.
OpenVINO enhances Ultralytics YOLO model throughput by maximizing device resource utilization without sacrificing performance. Key benefits include:
Example configuration:
import openvino.properties.hint as hints
config = {hints.performance_mode: hints.PerformanceMode.THROUGHPUT}
compiled_model = core.compile_model(model, "GPU", config)
Learn more about throughput optimization in the Throughput Optimization section of our detailed guide.
To reduce first-inference latency, consider these practices:
ov::enable_mmap(true)
) by default but switch to reading (ov::enable_mmap(false)
) if the model is on a removable or network drive.For detailed strategies on managing first-inference latency, refer to the Managing First-Inference Latency section.
Balancing latency and throughput optimization requires understanding your application needs:
Using OpenVINO's high-level performance hints and multi-device modes can help strike the right balance. Choose the appropriate OpenVINO Performance hints based on your specific requirements.
Yes, Ultralytics YOLO models are highly versatile and can be integrated with various AI frameworks. Options include:
Explore more integrations on the Ultralytics Integrations page.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?