Skip to content

Adding & Editing Metadata

After creating our dataset, we want to start curating our datasets in preparation of training our first model. To do this, we should add relevant information, also called metadata or enrichments, to our dataset. Some metadata is automatically generated when DagsHub scans your data files, but DagsHub also makes it easy to upload this information, either via code or the UI.

Let's see how to upload metadata and edit it through DagsHub

Video Tutorial

Step-by-Step Guide

UI Flow

Single File Metadata Management

  1. In your dataset view, click on the datapoint you'd like to add or edit metadata for.

    Selecting a Single File
    Selecting a Single File

  2. Then click on "Add New".

    Add New Metadata
    Add New Metadata

  3. Now, you'll be able to choose the new metadata field's name, the type (e.g. Boolean, String, Number, etc.) and input the value. You can't edit the auto-generated metadata but after you add custom metadata, you'll be able to edit those values in the same way.

    Adding Values
    Adding Values

  4. After finishing, click the save button at the bottom of the field to apply your changes. You're done.

Bulk Metadata Editing

When you need to edit the metadata of many files, doing it one-by-one might be a waste of time. Luckily, DagsHub offers A way to edit metadata in bulk. To do this, follow these steps:

  1. In your dataset view, select the relevant data points to edit.

    Select Data
    Select Data

  2. Now click the "Edit Metadata" button.

    Edit Metadata Button
    Edit Metadata Button

  3. In the bulk metadata editing view, you'll be able to select the field name, field type, and the value to apply to all selected datapoints. If you want to edit a metadata value, you'll need to select one of the existing fields, then enter the updated value.

    Bulk Metadata Editing
    Bulk Metadata Editing

  4. After you're done, click the "Save" button to apply your changes. You're done!

UI Upload Metadata CSV

If you already have your metadata stored in a .csv, .csv.zip, .gz, or .parquet file, you can upload all of it at once.

  1. Start by clicking the "⠇" at the right side of the dataset view menu, then select "import metadata":

    More Options Button
    More Options Button

  2. Now drag your metadata file.

    Import Metadata Menu
    Import Metadata Menu

  3. To initialize the import process, you need to also provide the column that holds the path to the files. This is what DagsHub uses to associate the metadata with the correct datapoint.

  4. Now, click on "Upload". The metadata import process will begin, and you'll be notified when it's completed.

    Import Metadata Process
    Import Metadata Process

    When the process is complete, you'll see the new metadata fields appear on the metadata sidebar.

Python Client Flow

In more advanced use cases, you might want to update and add metadata programmatically. For example, let's say you have a function called get_dominant_colors() that calculates the most dominant color in each image, and we want to add it as metadata.

  1. Start by installing the DagsHub client. Simply type in the following:

    $ pip3 install dagshub
    

  2. Retrieve the datasource you created with the following code:

    from dagshub.data_engine import datasources
    
    ds = datasources.get_datasource(
      repo="<user_name>/<repo_name>", # User name and repository name separated by a "/"
      name="<datasource_name>", # Name of your datasource
    ) 
    

  3. Now we can use ds.metadata_context() which is a way to go into "metadata editing mode":

    with ds.metadata_context() as ctx:
       for dp in ds.all(): # Iterate over all datapoints in our datasource
         path = dp.download_file() # Retrieve raw data locally
         dp["dominant_color"] = get_dominant_colors(path)
    

  4. That's all, after running this code, we'll see the new metadata uploaded to our dataset. This method is the easiest way to add metadata, but if you want to check our other mechanisms, check out the full data enrichment docs.

Next Steps

You now have an enriched datasource - it's time to query and annotate it.