Create a Project on DAGsHub¶
This part of the Get Started section focuses on the configuration process when creating a project on DAGsHub.
We will cover how to create a DAGsHub repository, connect it to your local computer, configure DVC,
and set DAGsHub storage as remote storage.
There is no need to configure anything to start the project from this point.
- To create a new user on DAGsHub, we will use the DAGsHub Signup page.
- We recommend signing up with your GitHub account, but you can also use your good old email. If you sign up with GitHub, you will be redirected to set your DAGsHub password.
You will use your DAGsHub password frequently, please choose one that you will remember.
Create a DAGsHub Repository¶
- Now, we would like to create a new repository on DAGsHub. Click on the 'Create' button and choose the 'New Repository' option.
- You'll be redirected to the repository settings dialog.
- Fill in the name of the repository as 'hello-world' and add Python to the .gitignore file selector. Then click the 'Create Repository' button at the bottom.
- Congratulations - you created your first DAGsHub repository!
Clone the Repository¶
Now, we'll clone the Git remote, which is stored on DAGsHub, to our local computer.
- Go to the repository page, click on the remote button and copy the Git remote link.
From your CLI, change the directory to where you wish to clone the repository and git-clone it using the copied link.
cd path/to/folder git clone https://dagshub.com/<DAGsHub-user-name>/hello-world.git .
Create and Activate a Virtual Environment
python3 -m venv <virtual-environment-name> echo <virtual-environment-name> >> .gitignore source <virtual-environment-name>/bin/activate
py -m venv <virtual-environment-name> echo venv >> .gitignore <virtual-environment-name>/Scripts/activate.bat
- Note: To verify that you activated the virtual environment, its name should appear in the parenthesis on the left.
To use DVC, we will have to initialize and configure it in our local repository. DAGsHub makes this process easy by only running the following six commands.
We will start by installing DVC on the virtual environment and initialize it.
pip install dvc dvc init
Configure DAGsHub as DVC Remote Storage¶
In order to host the data & models alongside our code, we need to create a DVC storage remote. What this usually means is signing up for a cloud account, creating a storage bucket, configuring permissions, etc. This process can be a hassle, even if you are familiar with it. To save you the trouble, we created a free, zero-configuration DVC remote called DAGsHub Storage!
When you create a DAGsHub project, it is automatically configured with its own DAGsHub Storage remote. To configure it locally, all you need to do is copy and paste four commands from your DAGsHub repository to your CLI.
Copy the commands form the DAGsHub repository to your CLI
dvc remote add origin --local https://dagshub.com/<DAGsHub-user-name>/hello-world.dvc dvc remote modify origin --local auth basic dvc remote modify origin --local user <DAGsHub-user-name> dvc remote modify origin --local ask_password true
For more information about DAGsHub storage, visit the reference page.
- If you still want to set up your own cloud remote storage, please refer to our setup external remote storage page.
Check that the current DVC configuration matches the following:
cat .dvc/config.local ['remote "origin"'] url = https://dagshub.com/<DAGsHub-user-name>/hello-world.dvc auth = basic user = <DAGsHub-user-name>> ask_password = true
type .dvc/config.local ['remote "origin"'] url = https://dagshub.com/<DAGsHub-user-name>/hello-world.dvc auth = basic user = <DAGsHub-user-name>> ask_password = true
Version and push DVC Configurations¶
We've initialized and configured DVC in our local directory. These actions created and updated the .dvc directory and
.dvcignore file. These are configuration files for our project and should be tracked with Git.
Rule of thumb: Git will track every file that ends with '.dvc'.
Check the local repository status
git status -s A .dvc/.gitignore A .dvc/config A .dvc/plots/confusion.json A .dvc/plots/confusion_normalized.json A .dvc/plots/default.json A .dvc/plots/linear.json A .dvc/plots/scatter.json A .dvc/plots/smooth.json A .dvcignore M .gitignore
Add and push the untracked and modified files using Git tracking
git add .dvc .dvcignore .gitignore git commit -m "Initialize DVC" git push
Check the new status of the DAGsHub repository
So far, we've created our very first DAGsHub project, clone it to our local computer, and configured our Git and DVC remotes. In the next parts, we will learn how to: