Curating and Managing Datasets¶
DagsHub helps you manage your large scale datasets easily, so you can focus on improving your models. To do this, we built Data Engine which includes tools and APIs for querying, visualizing, annotating, and generating dataloaders to easily train, evaluate and debug your models.
The following use cases will guide you on how to use the Data Engine end to end; from connecting and enriching your data to querying and visualizing, annotating and finally training and improving your model.
Before using DagsHub Data Engine, make sure you have the DagsHub Client installed:
pip install dagshub
next, make sure to import the Datasource class:
from dagshub.data_engine import datasources
Start using Data Engine by connecting your datasource
-
Connect the data you want to work with
-
Add custom metadata, predictions and labels to your data
-
Query your data and generate new subsets to re-train your model
-
Visualize data points and their enrichments
-
Annotate relevant data points
-
Train and improve your model with new datasets