Skip to content

Upload Data

Now that you've created your DagsHub project, or connected an existing project, the next step is to get your data on DagsHub. The easiest way to do that is to upload your data to DagsHub Storage. DagsHub storage is a hosted S3-Compatible bucket that comes with every DagsHub repository. It's easy to use, and highly scalable.

If you already have your data in a storage bucket, check out the guide on connecting external buckets. If you'd like to use DVC to version your data files, check out the guide for versioning data.

Video Tutorial

Step-by-Step Guide

Installation and Setup

Start by installing the DagsHub client. Simply type in the following:

$ pip3 install dagshub

Uploading the actual data

The structure of the upload command is very simple:

$ dagshub upload --bucket <repo_owner>/<repo_name> <local_path> <remote_path>

Using all functionality of DagsHub's data access tools requires authentication, which by default will use an interactive OAuth flow. If you want to use persistent tokens, read our short guide about authentication

Let's assume your data is in a folder named my_data/, and that we'd like to upload it to a folder called dataset/ in our S3-Compatible storage in a repo called my_project owned by DagsHub. In your terminal, run:

$ dagshub upload --bucket DagsHub/my_project my_data/ dataset/

If you prefer to upload the data in Python over the terminal, you can do that by running the following:

dagshub.upload_files("<user_name>/<repo_name>", "<local_path>", remote_path="<remote_path>", bucket=True)

Next Steps

Now that you've uploaded your data files, you have your data alongside your code. That's a great first step, but to train a model, you'll need to create and curate a dataset. Learn how to actually turn your data into a dataset.