Skip to content

Connect External Storage

DagsHub supports connecting external storage to DagsHub repositories to access and interact with your data and large files without leaving the DagsHub platform. Connecting external buckets doesn't copy your data – DagsHub manages pointers to the data, enabling secure and fast data management at scale.

Video Tutorial

Here's a 2-minute video to show you how to connect your external storage to DagsHub:

Step-by-Step Guide

This guide will walk you through connecting your bucket to DagsHub.

It assumes you already have created your bucket set up with the correct permissions.

In the example we'll use AWS S3, but other options work similarly, simply choose the relevant provider from the list.

Connection flow for external buckets

If you've already added code or connected your Git repo to your project, simply click the "Connect external storage bucket" button, and select the relevant storage you want to connect. Otherwise, if you project is entirely empty, you can go to the settings tab, and select the Integrations sub-section, where you'll see all available integrations. Some integrations also appear on the repo empty state.

Connect a bucket section

Follow the connection wizard and add the relevant details.

Choose storage type

Note

Your bucket URL should include the relevant prefix s3://, gs://, azure://, and s3:// for S3-compatible storage.

Accessing and viewing your connected storage bucket

Once you have connected your bucket you'll see the bucket appear in the "Storage Buckets" section of your files tab. You'll be able to see its contents, download files, and create a Data Engine Datasource from it for convenient annotation and training.

See Files inside a connected bucket

Next Steps

Now that you've connected your bucket, you have your data alongside your code. That's a great first step, but to train a model, you'll need to create and curate a dataset. Learn how to actually turn your data into a dataset.