Convert YOLO and COCO Annotations to DagsHub Format

Nir Barazida
3 min read
3 years ago

MLOps Team Lead @ DagsHub

Table of Contents

Share This Article

DagsHub provides a free online data annotation tool, using Label Studio under the hood. The annotations are saved in open formats for versioning and loading of existing labeling projects.

However, we know not all projects start with DagsHub (yet 😉), and some users join us with projects that already have annotations.

Now what?

To avoid re-annotating your data (heck no!) we developed a tool that converts annotations saved in popular formants, YOLO and COCO, to DagsHub format.

You can now upload your data, view the annotations on DagsHub, modify, and use them for ML training - without writing a single line of code!

Let's learn how you can do that.

What are YOLO and COCO Annotations?

Annotation formats define how object annotations are structured and represented in datasets. Two popular annotation formats widely used in the computer vision community are YOLO (You Only Look Once) and COCO (Common Objects in Context). These formats provide a standardized way to label objects with bounding boxes and associated class labels.

YOLO annotations are commonly used in object detection tasks and provide a straightforward format for labeling objects. It consist of text files, where each line represents an object and contains the class label, followed by the coordinates of the bounding box relative to the image size.

The annotations in COCO are stored in a JSON file, which is a structured data format that allows for flexibility and extensibility. The JSON file contains a dictionary with several key-value pairs, each representing different elements of the annotation.

The "images" section holds information about the images.
The "categories" section defines object classes.
The "annotations" section contains details about individual object instances, including bounding box coordinates, category IDs, and segmentation information.

How to convert YOLO & COCO annotations to DagsHub format?

Check out the full doc section including the functions to import annotations into DagsHub here.

The high level steps are:

Upload your images to DagsHub Storage using the DagsHub client:
dagshub upload <dagshub_user_name>/<dagshub_repo_name><local/path/to/images> <remote/path/to/data> --bucket
Create a datasource from the folder you uploaded in the datasets tab.
Connect to the datasource locally by running:
from dagshub.data_engine import datasources ds = datasources.get('<dagshub_user_name>/<dagshub_repo_name>', 'datasource_name')
Then run the following snippet in your local repo that contains the annotation files (adjust depending on YOLO/COCO format):

ds.import_annotations_from_files(
  annotation_type="yolo", # or 'coco'
  path="annotations.yaml",
  field="imported_annotations", # name of the imported field
  yolo_type="segmentation" # or bbox
)

How to view your annotations on DagsHub?

Simply run:

ds.visualize()

And click the link in the output. It will take you to your repo on DagsHub where you'll be able to see your images and annotations.

Why should I convert my Annotations to DagsHub Annotations?

By moving your annotations to DagsHub, you gain access to a fully configured and free annotation workspace, with Label Studio under the hood. This workspace has access to your DagsHub Storage and Git server, enabling you and your team to annotation the data without doing any MLOps work or moving to a third party platform.

Additionally, you can visualize and review the annotations alongside their data directly on DagsHub, eliminating the need to write complex code for visualization.

Last, DagsHub provides version control for your data, annotations, experiments and code, enabling to reproduce results with a click of a button.

Looking to convert other formats? Let us know!

If you have other annotation formats you'd like us to support, please reach out on our community Discord and we'll take care of it!