DVC¶
DVC is an open-source version control tool for machine learning projects designed to handle large files, data sets, machine learning models, and metrics. It works on top of Git to easily integrate with your existing Git code repositories. DagsHub integration with DVC includes a fully configured remote object storage managed by DVC, showing and diffing DVC tracked files hosted on DagsHub Storage or S3 compatible, and Data Pipeline visualization.
How does the integration of DagsHub with DVC work?¶
DagsHub Storage¶
DagsHub automatically configures a remote object storage for every repository with 100 GBs of free space. The storage can
be managed by DVC and easily configured with any machine. Using the DVC pointer files (.dvc
) and the dvc.lock
file,
host on the Git commit, DagsHub parsed the storage and displays the DVC tracked files under the Files tab.
Learn how to use DagsHub Storage
External Storage Buckets¶
DagsHub supports visualizing and managing DVC data stored on any AWS S3, Google Cloud Storage, Azure Blob Storage, or any S3 Compatible storage including MinIO.
Learn how to configure your external bucket.
Visualize DVC pipelines¶
DagsHub parses the dvc.lock and dvc.yaml file to create the interactive data pipeline. The pipeline is versioned and holds valuable information about the different files, metrics, and data steps.
How to use DVC with DagsHub?¶
DagsHub Storage¶
Configure DagsHub Storage with your machine¶
- Go to your repository homepage
- Click on the remote button, and select the Data tab.
- Select DVC
- Copy the commands to set your local machine with DagsHub Storage
DVC remote
-
Enter a terminal in your project, paste the commands and run them
dvc remote add origin s3://dvc dvc remote modify origin endpointurl https://dagshub.com/<DagsHub-user-name>/<repo_name>.s3 dvc remote modify origin --local access_key_id <Token> dvc remote modify origin --local secret_access_key <Token>
Why --local?
Everything you configure without
--local
will end up in the.dvc/config
file, which is tracked by git, and appear in you repository. Personal info like authentication details should always be kept local.
Note: You need to be inside a Git and DVC directory for this process to succeed. To learn how to do that, please follow the first part of the Get Started section.
Pulling data¶
dvc pull -r origin
Pushing data¶
dvc push -r origin
Visualize DVC pipelines:¶
- Run a DVC pipeline
- Version the dvc.lock and dvc.yaml files using Git.
- Version with Git the files not tracked by DVC.
- Push the Git and DVC tracked files to DagsHub. Note: You can follow the Pipeline tutorial to learn how to build a DVC pipeline