#21675 FROM pytorch/pytorch:2.8.0-cuda12.8-cudnn9-runtime

Merged

Ghost merged 1 commits into Ultralytics:main from ultralytics:glenn-jocher-patch-1

Argument	Type	Default	Description
`format`	`str`	`'torchscript'`	Target format for the exported model, such as `'onnx'`, `'torchscript'`, `'engine'` (TensorRT), or others. Each format enables compatibility with different deployment environments.
`imgsz`	`int` or `tuple`	`640`	Desired image size for the model input. Can be an integer for square images (e.g., `640` for 640×640) or a tuple `(height, width)` for specific dimensions.
`keras`	`bool`	`False`	Enables export to Keras format for TensorFlow SavedModel, providing compatibility with TensorFlow serving and APIs.
`optimize`	`bool`	`False`	Applies optimization for mobile devices when exporting to TorchScript, potentially reducing model size and improving inference performance. Not compatible with NCNN format or CUDA devices.
`half`	`bool`	`False`	Enables FP16 (half-precision) quantization, reducing model size and potentially speeding up inference on supported hardware. Not compatible with INT8 quantization or CPU-only exports for ONNX.
`int8`	`bool`	`False`	Activates INT8 quantization, further compressing the model and speeding up inference with minimal accuracy loss, primarily for edge devices. When used with TensorRT, performs post-training quantization (PTQ).
`dynamic`	`bool`	`False`	Allows dynamic input sizes for ONNX, TensorRT and OpenVINO exports, enhancing flexibility in handling varying image dimensions. Automatically set to `True` when using TensorRT with INT8.
`simplify`	`bool`	`True`	Simplifies the model graph for ONNX exports with `onnxslim`, potentially improving performance and compatibility with inference engines.
`opset`	`int`	`None`	Specifies the ONNX opset version for compatibility with different ONNX parsers and runtimes. If not set, uses the latest supported version.
`workspace`	`float` or `None`	`None`	Sets the maximum workspace size in GiB for TensorRT optimizations, balancing memory usage and performance. Use `None` for auto-allocation by TensorRT up to device maximum.
`nms`	`bool`	`False`	Adds Non-Maximum Suppression (NMS) to the exported model when supported (see Export Formats), improving detection post-processing efficiency. Not available for end2end models.
`batch`	`int`	`1`	Specifies export model batch inference size or the maximum number of images the exported model will process concurrently in `predict` mode. For Edge TPU exports, this is automatically set to 1.
`device`	`str`	`None`	Specifies the device for exporting: GPU (`device=0`), CPU (`device=cpu`), MPS for Apple silicon (`device=mps`) or DLA for NVIDIA Jetson (`device=dla:0` or `device=dla:1`). TensorRT exports automatically use GPU.
`data`	`str`	`'coco8.yaml'`	Path to the dataset configuration file (default: `coco8.yaml`), essential for INT8 quantization calibration. If not specified with INT8 enabled, a default dataset will be assigned.
`fraction`	`float`	`1.0`	Specifies the fraction of the dataset to use for INT8 quantization calibration. Allows for calibrating on a subset of the full dataset, useful for experiments or when resources are limited. If not specified with INT8 enabled, the full dataset will be used.

Tip!

Press p or to see the previous file or, n or to see the next file