Managing Datasets with Data Engine¶
DagsHub helps you manage your large scale datasets easily, so you can focus on improving your models. To do this, we build Data Engine which includes tools and APIs for querying, visualizing, annotating, and generating dataloaders to easily train, evaluate and debug your models.
The following use cases will guide you on how to use the Data Engine end to end; from connecting and enriching your data to querying and visualizing, annotating and finally training and improving your model.
Before using DagsHub Data Engine, make sure you have the DagsHub Client installed:
pip install dagshub
next, make sure to import the Datasource class:
from dagshub.data_engine import datasources
Start using Data Engine by connecting your datasource
Connect Datasource
Connect the data you want to work with
Enrich Data
Add custom metadata, predictions and labels to your data
Query Data
Query your data and generate new subsets to re-train your model
Visualize your data
Visualize data points and their enrichments
Annotate your data
Annotate relevant data points
Train a model
Train and improve your model with new datasets