Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

dagshub_storage.md 4.6 KB

You have to be logged in to leave a comment. Sign In

DagsHub Storage

DagsHub automatically configures a remote object storage for every repository with 10 GB of free space. Everyone can use it without having a degree in DevOps and a billing account in a cloud provider. The storage can be managed by DVC and easily configured with any machine.

How DagsHub Storage work?

The same way you get a git remote URL for your git repository. You create a repository, and it automatically provides you with a remote URL for your Storage. When pushing or pulling data from this URL, you use your existing DagsHub credentials (via HTTPS basic authentication). Using the DVC pointer files (.dvc) and the dvc.lock file, host on the Git commit, DagsHub parses the storage and displays the DVC tracked files under the Files tab.

This means you automatically get the same access control as the normal code git repository - public repo data is publicly readable, but only maintainers of the project can push data or read data from a private repo. Just setup your DagsHub Storage, and start working!

How to set up DagsHub Storage locally?

Configure DagsHub Storage with your machine

  1. Go to your repository homepage, click on the remote button and hit the ? mark next to DVC remote.

  2. Copy the commands to set your local machine with DagsHub Storage

    ![DVC remote](../integration_guide/assets/dvc_remote.png) DVC remote
  3. Enter a terminal in your project, paste the commands and run them:

    dvc remote add origin https://dagshub.com/<DagsHub-user-name>/hello-world.dvc
    dvc remote modify origin --local auth basic
    dvc remote modify origin --local user <DagsHub-user-name>
    dvc remote modify origin --local password <Token>
    

    ??? info "Why --local?" Everything you configure without --local will end up in the .dvc/config file, which is tracked by git, and appear in you repository. Personal info like authentication details should always be kept local.

That's it! You can now pull data from your remote cache

Note: You need to be inside a Git and DVC directory for this process to succeed. To learn how to do that, please follow the first part of the Get Started section.

Pull data

dvc pull -r origin

Push data

  1. First, make sure you are using DVC{target=_blank} version 1.10 or greater

  2. Then you can run:

    dvc push -r origin
    

Accessing DagsHub Storage as an S3 Bucket

You can also access the DVC storage of your repository as an S3 bucket. This can improve your storage experience, as DVC supports request retrying for S3.

Connecting to the bucket

Here's a list of variables you need to connect to the bucket:

  • Bucket name: s3://dvc
  • Endpoint URL: https://dagshub.com/<username>/<repo>.s3
  • Access Key ID: <Token>
  • Secret Access Key: <Token>

Note: You need to use your user token as both Access Key ID and Secret Access Key. Be sure to keep it secret!

As an example, here's how you can set up DVC to use the S3 compatible endpoint to push and pull data:

dvc remote add origin-s3 s3://dvc
dvc remote modify origin-s3 endpointurl https://dagshub.com/<username>/<repo>.s3
dvc remote modify origin-s3 --local access_key_id <Token>
dvc remote modify origin-s3 --local secret_access_key <Token>

Available endpoints

The following S3 endpoints are supported:

Objects:

Multipart uploads:

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...