Connect External Storage¶
DagsHub supports connecting external storage to DagsHub repositories to access and interact with your data and large files without leaving the DagsHub platform. Connecting external buckets doesn't copy your data – DagsHub manages pointers to the data, enabling secure and fast data management at scale.
Video Tutorial¶
Here's a 2-minute video to show you how to connect your external storage to DagsHub:
Step-by-Step Guide¶
This guide will walk you through connecting your bucket to DagsHub.
It assumes you already have created your bucket set up with the correct permissions.
In the example we'll use AWS S3, but other options work similarly, simply choose the relevant provider from the list.
Connection flow for external buckets¶
If you've already added code or connected your Git repo to your project, simply click the "Connect external storage bucket" button, and select the relevant storage you want to connect. Otherwise, if you project is entirely empty, you can go to the settings tab, and select the Integrations sub-section, where you'll see all available integrations. Some integrations also appear on the repo empty state.
Follow the connection wizard and add the relevant details.
Note
Your bucket URL should include the relevant prefix s3://
, gs://
, azure://
, and s3://
for S3-compatible storage.
Accessing and viewing your connected storage bucket¶
Once you have connected your bucket you'll see the bucket appear in the "Storage Buckets" section of your files tab. You'll be able to see its contents, download files, and create a Data Engine Datasource from it for convenient annotation and training.
Next Steps¶
Now that you've connected your bucket, you have your data alongside your code. That's a great first step, but to train a model, you'll need to create and curate a dataset. Learn how to actually turn your data into a dataset.