Adding & Editing Metadata¶
After creating our dataset, we want to start curating our datasets in preparation of training our first model. To do this, we should add relevant information, also called metadata or enrichments, to our dataset. Some metadata is automatically generated when DagsHub scans your data files, but DagsHub also makes it easy to upload this information, either via code or the UI.
Let's see how to upload metadata and edit it through DagsHub
Video Tutorial¶
Step-by-Step Guide¶
UI Flow¶
Single File Metadata Management¶
-
In your dataset view, click on the datapoint you'd like to add or edit metadata for.
-
Then click on "Add New".
-
Now, you'll be able to choose the new metadata field's name, the type (e.g. Boolean, String, Number, etc.) and input the value. You can't edit the auto-generated metadata but after you add custom metadata, you'll be able to edit those values in the same way.
-
After finishing, click the save button at the bottom of the field to apply your changes. You're done.
Bulk Metadata Editing¶
When you need to edit the metadata of many files, doing it one-by-one might be a waste of time. Luckily, DagsHub offers A way to edit metadata in bulk. To do this, follow these steps:
-
In your dataset view, select the relevant data points to edit.
-
Now click the "Edit Metadata" button.
-
In the bulk metadata editing view, you'll be able to select the field name, field type, and the value to apply to all selected datapoints. If you want to edit a metadata value, you'll need to select one of the existing fields, then enter the updated value.
-
After you're done, click the "Save" button to apply your changes. You're done!
UI Upload Metadata CSV¶
If you already have your metadata stored in a .csv
, .csv.zip
, .gz
, or .parquet
file, you can upload all of it at once.
-
Start by clicking the "⠇" at the right side of the dataset view menu, then select "import metadata":
-
Now drag your metadata file.
-
To initialize the import process, you need to also provide the column that holds the path to the files. This is what DagsHub uses to associate the metadata with the correct datapoint.
-
Now, click on "Upload". The metadata import process will begin, and you'll be notified when it's completed.
When the process is complete, you'll see the new metadata fields appear on the metadata sidebar.
Python Client Flow¶
In more advanced use cases, you might want to update and add metadata programmatically. For example, let's say you have
a function called get_dominant_colors()
that calculates the most dominant color in each image, and we want to add it as
metadata.
-
Start by installing the DagsHub client. Simply type in the following:
$ pip3 install dagshub
-
Retrieve the datasource you created with the following code:
from dagshub.data_engine import datasources ds = datasources.get_datasource( repo="<user_name>/<repo_name>", # User name and repository name separated by a "/" name="<datasource_name>", # Name of your datasource )
-
Now we can use
ds.metadata_context()
which is a way to go into "metadata editing mode":with ds.metadata_context() as ctx: for dp in ds.all(): # Iterate over all datapoints in our datasource path = dp.download_file() # Retrieve raw data locally dp["dominant_color"] = get_dominant_colors(path)
-
That's all, after running this code, we'll see the new metadata uploaded to our dataset. This method is the easiest way to add metadata, but if you want to check our other mechanisms, check out the full data enrichment docs.
Next Steps¶
You now have an enriched datasource - it's time to query and annotate it.