Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

dvc.md 3.6 KB

You have to be logged in to leave a comment. Sign In

#DVC DVC{:rel="nofollow" target="_blank"} is an open-source version control tool for machine learning projects designed to handle large files, data sets, machine learning models, and metrics. It works on top of Git to easily integrate with your existing Git code repositories. DagsHub integration with DVC includes a fully configured remote object storage managed by DVC, showing and diffing DVC tracked files hosted on DagsHub Storage or S3 compatible, and Data Pipeline visualization.

How do the integration of DagsHub with DVC work?

DagsHub Storage

DagsHub automatically configures a remote object storage for every repository with 10 GB of free space. The storage can be managed by DVC and easily configured with any machine. Using the DVC pointer files (.dvc) and the dvc.lock file, host on the Git commit, DagsHub parsed the storage and displays the DVC tracked files under the Files tab.

S3 compatible

The same as with DagsHub Storage, you can configure an existing AWS S3, Google Storage, or S3 compatible with DagsHub and view the DVC tracked files under the Files tab.

Visualize DVC pipelines

DagsHub parses the dvc.lock and dvc.yaml file to create the interactive data pipeline. The pipeline is versioned and holds valuable information about the different files, metrics, and data steps.

How to use DVC with DagsHub?

DagsHub Storage

Configure DagsHub Storage with your machine

  1. Go to your repository homepage, click on the remote button and hit the ? mark next to DVC remote.
  2. Copy the commands to set your local machine with DagsHub Storage
![DVC remote](assets/dvc_remote.png) DVC remote 3. Enter a terminal in your project, paste the commands and run them
```bash 
dvc remote add origin https://dagshub.com/<DagsHub-user-name>/hello-world.dvc
dvc remote modify origin --local auth basic
dvc remote modify origin --local user <DagsHub-user-name>
dvc remote modify origin --local password <Token>
```

??? info "Why --local?"
    Everything you configure without `--local` will end up in the `.dvc/config` file, which is tracked by git, and appear in you repository.
    Personal info like authentication details should always be kept local.
That's it! You can now pull data from your remote cache

Note: You need to be inside a Git and DVC directory for this process to succeed. To learn how to do that, please follow the first part of the Get Started section.

Pull data

dvc pull -r origin

Push data

  1. First, make sure you are using DVC{target=_blank} version 1.10 or greater

  2. Then you can run:

    dvc push -r origin
    

Configure S3 compatible with your DagsHub repository

  1. Go to the settings of your repository and click on the integrations tab.
  2. Chose the object storage you're using (AWS S3, Google Storage, or S3 compatible) and follow the guide.

Note: The external storage documentations also show how to set a remote object storage with your project

Visualize DVC pipelines:

  1. Run a DVC pipeline
  2. Version the dvc.lock and dvc.yaml files using Git.
  3. Version with Git the files not tracked by DVC.
  4. Push the Git and DVC tracked files to DagsHub. Note: You can follow the Pipeline tutorial to learn how to build a DVC pipeline
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...