Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

set_up_dagshub.md 7.7 KB

You have to be logged in to leave a comment. Sign In

Set Up DagsHub

This part of the Quick Start section focuses on the configuration process when creating a project on DagsHub. This guide covers the following topics:

  • How to create a DagsHub repository
  • How to configure DagsHub's Git and DVC remotes on your local computer

There is no need to configure anything to start the project from this point.

!!! illustration "Video for this tutorial" Prefer to follow along with a video instead of reading? Check out the video for this section below:

<center>
<iframe width="400" height="225" src="https://www.youtube.com/embed/ECbVxGqS0f0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</center>

Create a DagsHub Repository

  • Now, we would like to create a new repository on DagsHub. Click on the 'Create' button and choose the 'New Repository' option.
[![create-repository](assets/0-create-repository.png){: style="padding-top:0.7em"}](assets/0-create-repository.png){target=_blank} - You'll be redirected to the repository settings dialog. - Fill in the name of the repository as 'hello-world' and add Python to the .gitignore file selector. Then click the 'Create Repository' button at the bottom. [![new-repository-settings](assets/1-new-repo-settings.png){: style="height:60%;width:60%"}](assets/1-new-repo-settings.png){target=_blank}

Clone the Repository

Now, we'll clone the Git remote, which is stored on DagsHub, to our local computer.

  • Go to the repository page, click on the remote button and copy the Git remote link.
[![git-remote](assets/2-git-remote.png){: style="padding-top:0.7em"}](assets/2-git-remote.png){target=_blank} - From your CLI, change the directory to where you wish to clone the repository and git-clone it using the copied link.
=== "Mac, Linux, Windows"
    ```bash
    cd path/to/folder
    git clone https://dagshub.com/<DagsHub-user-name>/hello-world.git .
    ```

We recommend you create and activate a virtual environment before moving forward. ??? info "Recommended: Create and Activate a Virtual Environment" - Make sure you're in the project directory when following this. - If you're using Python 2, replace venv with virtualenv in the below commands. - The name of the virtual environment is for you to choose. The convention is 'env' or 'venv'. - We will add the virtual environment name to the .gitignore file, so Git will not track it. === "Mac, Linux" bash python3 -m venv <virtual-environment-name> echo <virtual-environment-name> >> .gitignore source <virtual-environment-name>/bin/activate === "Windows" shell py -m venv <virtual-environment-name> echo venv >> .gitignore <virtual-environment-name>/Scripts/activate.bat

- **<u>Note</u>**: *To verify that you activated the virtual environment, its name should appear in the parentheses on the left.*

Setup DVC

To use DVC, we will have to initialize and configure it in our local repository. DagsHub makes this process easy by only running the following six commands.

  • We will start by installing DVC on the virtual environment and initialize it.

    === "Mac, Linux, Windows" pip3 install dvc dvc init

DagsHub Storage as a DVC Remote

Every project on DagsHub comes with a fully configured remote object storage managed by DVC, enabling you to view your data and models alongside your code, experiments, pipeline, and more.

There are two ways to configure your local repoository.

Automatic Configuration Using the DagsHub Client

The easiest way by far is to use the DagsHub Client to configure your local repo to use DagsHub as a DVC remote.

First, install the DagsHub Client using pip:

=== "Mac, Linux, Windows" pip3 install dagshub

Then we use the client to configure the DagsHub storage as the DVC remote:

=== "Mac, Linux, Windows" cd path/to/local/repository dagshub setup dvc

The first time you run this command, it will initiate an Oauth process to conveniently authenticate your machine. It will do this by opening a new browser window or tab, which looks like:

[![oauth-request](assets/3a-oauth-request.png){: style="height:60%;width:60%"}](assets/3a-oauth-request.png){target=_blank}

After clicking accept, you should see the following authorization success screen:

[![authorization-success](assets/3b-authorization-success.png){: style="height:60%;width:60%"}](assets/3b-authorization-success.png){target=_blank}

Manual Configuration using DVC

If you cannot or do not want to use the automatic configuration process, you can still use DVC to configure your repo locally. All you need to do is:

  1. Click the Remote button, which opens up the remotes menu

  2. Select the DVC tab

  3. Copy the four commands to your terminal

    [![dvc-commands](assets/3-copy-dvc-commands.png)](assets/3-copy-dvc-commands.png){target=_blank}

???+ info "DVC Commands" - The first command adds your DagsHub repository storage as the DVC remote. === "Mac, Linux, Windows" bash dvc remote add origin https://dagshub.com/<DagsHub-user-name>/hello-world.dvc - The next three commands set up your credentials for DVC. === "Mac, Linux, Windows" bash dvc remote modify origin --local auth basic dvc remote modify origin --local user <DagsHub-user-name> dvc remote modify origin --local password <Token>

??? checkpoint "Checkpoint"

Check that the current DVC configuration matches the following:

=== "Mac, Linux"
    ```bash
    cat .dvc/config.local
        ['remote "origin"']
            url = https://dagshub.com/<DagsHub-user-name>/hello-world.dvc
            auth = basic
            user = <DagsHub-user-name>>
            ask_password = true
    ```
=== "Windows"
    ```bash
    type .dvc/config.local
        ['remote "origin"']
            url = https://dagshub.com/<DagsHub-user-name>/hello-world.dvc
            auth = basic
            user = <DagsHub-user-name>>
            ask_password = true
    ```

Version and push DVC Configurations

We've initialized and configured DVC in our local directory. These actions created and updated the .dvc directory and the .dvcignore file. These are configuration files for our project and should be tracked with Git.Rule of thumb: Git will track every file that ends with '.dvc'.

  • Check the local repository status

    === "Mac, Linux, Windows" bash git status -s A .dvc/.gitignore A .dvc/config A .dvcignore M .gitignore

  • Add and push the untracked and modified files using Git tracking

    === "Mac, Linux, Windows" bash git add .dvc .dvcignore .gitignore git commit -m "Initialize DVC" git push

  • Check the new status of the DagsHub repository

[![repo-stat-after-push](assets/4-repo-stat-after-push.png){: style="padding-top:0.7em"}](assets/4-repo-stat-after-push.png){target=_blank}

So far, we've created our very first DagsHub project, cloned it to our local computer, and configured our Git and DVC remotes. In the next part, we will learn how to version code and data with Git and DVC.

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...