Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

label_studio.md 10 KB

You have to be logged in to leave a comment. Sign In

Label Studio

Label Studio is a powerful open-source tool that supports the labeling of many unstructured and structured data types. It provides an easy-to-use and intuitive UI with various templates you can easily customize. DagsHub Annotations – our integration with Label Studio provides a fully configured labeling workspace, with a built-in Label Studio instance fired up and ready to go.

How does the integration of DagsHub with Label Studio work?

Every repository on DagsHub is configured with a labeling workspace that has Label Studio installed. The workspace has full access to the project files, making them available to annotate directly from DagsHub's interface. To scale your work, DagsHub Annotations enable you to create multiple labeling projects on the workspace that are isolated from one another.

DagsHub Annotations provides a unique Git-Flow for labeling to ensure full reproducibility, scalability, and efficient version control of the labels and data. When creating a new labeling project, you associate it with a tip of an active branch which simulates the branching action. DagsHub loads and associates the annotations held on the branch with their tasks. You can version the annotations using Git, and once the labeling task is complete, create a pull request on DagsHub, where the reviewer can see and comment on every label.

How to create a new project on DagsHub Annotations?

To create a new labeling project for the first time, navigate to the Annotations tab and create a new workspace. This process can take 2-3 minutes as DagsHub spins up the Label Studio machine behind the scenes.

??? illustration "Create Label Studio workspace"

<br/>
<center>
  <video autoplay loop muted playsinline width="80%">
    <source src="../../tutorial/assets/create-workspace.webm" type="video/webm">
    <source src="../../tutorial/assets/create-workspace.mp4" type="video/mp4">
  </video>

 <sub>Create Label Studio workspace</sub></center>
<br/>

Once the workspace is ready, create a new project and associate it with an active branch. This marks the project's starting point and will make all the files hosted on DagsHub Storage, under the selected branch, available for labeling.

??? illustration "Create a Label Studio project"

<br/>
<center>
  <video autoplay loop muted playsinline width="60%">
    <source src="../../tutorial/assets/create-project.webm" type="video/webm">
    <source src="../../tutorial/assets/create-project.mp4" type="video/mp4">
  </video>

 <sub>Create Label Studio project</sub></center>
<br/>

How to choose files to label?

When you open the labeling project for the first time, you will have the option to select the files to annotate (AKA tasks). You can choose a specific file or an entire directory by checking the box next to its name.

??? illustration "Choose the files to annotate"

<br/>
<center>
  <video autoplay loop muted playsinline width="60%">
    <source src="../../tutorial/assets/choose-files.webm" type="video/webm">
    <source src="../../tutorial/assets/choose-files.mp4" type="video/mp4">
  </video>

 <sub>Choose the files to annotate</sub></center>
<br/>

How to version a Label Studio project?

DagsHub lets you version your annotations with Git and commit the changes to a remote branch directly from the UI. By clicking on the commit button, DagsHub Annotations saves your work in open source formats to a .labelstudio directory and provide the following options:

  • Save the annotations in one of the commonly used formats (JSON, COCO, CSV, TSV, etc.).
  • Commit the changes to the remote branch associated with the labeling project or to a new one.
  • Add a commit message.
![Commit Annotations](../feature_guide/assets/commit-file.png) Commit Annotations

What is the .labelstudio directory ?

The .labelstudio directory is the source of truth for DagsHub Annotations. DagsHub Annotations saves the annotations of each task to a JSON file under the .labelstudio directory. When creating a new labeling project, DagsHub parses the selected branch for this directory and loads the existing annotations to their associated tasks, enabling you to switch between the labeling versions easily.

Note: The JSON file name is the task path in the original project hashed by SHA1 function.

How to load labels from different projects?

When creating a new labeling project, DagsHub parses the selected branch for the .labelstudio directory, loads the annotations it holds and associates them with their tasks.

Note: You can currently load annotations only created by DagsHub Annotations.

How to manually import labels into Label Studio

Currently, importing labels into Label Studio is a little bit of a manual process. Here's an overview of what needs to be done:

  1. Create a .labelstudio directory at the root of your repo, if one does not already exist.
  2. Add a label_config.xml file to the .labelstudio directory, which includes information about your classes. See example below.
  3. Populate the .labelstudio directory with JSON files. See example below.
  4. Commit and push the .labelstudio directory with Git (NOT DVC) to the repo.
  5. Start an annotation project from the correct branch.
  6. Enjoy your imported labels.

Using Label Studio to generate file templates

The first time you try this, you might want to consider having Label Studio create templates for you to edit. This will allow you to see the format of the files and get a feel for how you need to structure your script to import you annotations.

To do this:

  1. Create a new annotation project via the web interface, as if you are going to annotate your project from scratch.
  2. Set up Label Studio with your presets and classes. This will make sure your label_config.xml file is properly configured.
  3. Annotate one or two files manually. This will create some JSON files you can use as a template when importing your annotations.
  4. Save and commit the canges.
  5. Create a new branch from there to import your annotations. See below for more information and examples.

What should a label_config.xml file contain?

The label_config.xml file describes the classes available for labeling and the settings for the annotation project. For example:

<View>
    <Image name="image" value="$image" zoomControl="false" zoom="false"/>
    <RectangleLabels name="label" toName="image">
        <Label value="Baby-Yoda" background="#FFA39E"/>
        <Label value="Mando" background="#0d73d3"/>
    </RectangleLabels>
</View>

This defines two classes for an object detection model, Baby-Yoda and Mando and sets the color of the annotations when viewed via Label Studio.

You can find further examples here and here.

What should the JSON files contain?

The JSON files that live in the .labelstudio directory describe the annotations for the images. There will be one JSON file per image and they have a very specific format.

The name of the file should be the SHA1 hash of the path to the image, relative to the root of the repo.

For example:

import hashlib

image_file = 'data/images/train/backyard_squirrels_000000.jpg'
filename_hash = hashlib.sha1(image_file.encode("utf-8")).hexdigest()
json_file = filename_hash + '.json'

In this example, the json_file would be 56f38098ffea4d6937b855e7ec2f01246526ff0e.json

You can find this particular JSON file here.

A few things to note based on this object detection example:

  • The bounding box origin (x, y) is the upper left of the bounding box. So if you're converting from YOLO, which uses the center of the bounding box, you need to convert.
  • All bounding box coordinates are percentages (not fractions) of the pixel resolution of the image between 0.0 - 100.0. This means if the bounding box width is half the width of the image, it should be set to 50.0 and NOT 0.5.
  • The repo reference (i.e. repo://9eabb902f1980a3215cf1d7ec90038b990a88a5d/data/images/train/backyard_squirrels_000000.jpg) is any commit hash where the image exists in the repo. This means you need to commit your data to DVC before importing annotations into Label Studio.

To see an example of how to generate these JSON files, checkout this script for creating Label Studio annotations from existing YOLO-style annotation files.

Starting project

When starting a Label Studio project using the process above, select the directory that contains the images, but not the annotations.

![Selecting Only Images When Starting a Project](assets/labelstudio_select_images.png) Selecting Only Images When Starting a Project

To learn more on how to use Label Studio with DagsHub please follow the end-to-end DagsHub Annotations tutorial.

Known Issues, Limitations & Restrictions

DagsHub currently supports labeling in non-mirror repositories, but we might soon. Please, contact us on our Discord server if you find it important.

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...