Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
General:  tutorial Type:  model Task:  classification Data Domain:  tabular Framework:  scikit-learn
3f21a54ce0
SGDClassifier
2 years ago
3f21a54ce0
SGDClassifier
2 years ago
3f21a54ce0
SGDClassifier
2 years ago
3f21a54ce0
SGDClassifier
2 years ago
9720cfb7b1
Update 'LICENSE'
2 years ago
f6c71e761b
Randomforest
2 years ago
fbb4d0eb0d
Update 'README.md'
2 years ago
f6c71e761b
Randomforest
2 years ago
f6c71e761b
Randomforest
2 years ago
f6c71e761b
Randomforest
2 years ago
3f21a54ce0
SGDClassifier
2 years ago
f6c71e761b
Randomforest
2 years ago
3f21a54ce0
SGDClassifier
2 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

DVC-ML-Experiments

In this guide, we will learn about DVC and how DAGsHub makes it easy for Machine learning engineers to track various experiments. We are going to train our model on a synthesized Titanic dataset and then we are going to run various experiments based on different models. In the end, we will visualize and compare using DAGsHub interactive dashboards. Before we dive into coding, I wanted to give you guys a brief introduction to DVC, FastDS, and DAGsHub.

Models and Experiments

In this section, we will work on three experiments using three different machine learning models and we are going to learn how to commit and push both Git and DVC to DAGsHub. Finally, we will compare the results and explore amazing metric visualizations.

Experiment 1

Running our first experiment with baseline code and a simple SGD Classifier. Running python file did print some train and test metrics but I have removed it to simply final comparisons.

python main.py
Loading data...
Engineering features...
Training model...
Saving trained model...
Evaluating model...
Creating Submission File...

We will be committing changes for dvc and git after an initial run to set the baseline. The code below show how simple it is to commit changes and then push those changes to a remote server

dvc commit -f Model.dvc Submission.dvc
git add Model.dvc Submission.dvc main.py metrics.csv params.yml
git commit -m "SGDClassifier"

Using git/dvc push we can push our data, model, and code to a remote server.

git push --all
dvc push -r origin

We will click on the experiment tab and explore our results. I have removed some additional columns and then I will be renaming the experiment name next. You can also play around with other options to make your results look easy to understand.

Our test accuracy is 60 percent which is quite bad and the f1 score is super bad with 0.17. We need to select another model or try different techniques to get better results.

Experiment 2

In the second experiment, we will be changing our model to DecisionTreeClassifier and then run the entire process again.

Run main.py
commit both git and dvc
push git and dvc
python main.py
dvc commit -f Model.dvc Submission.dvc
git add Model.dvc Submission.dvc main.py metrics.csv params.yml
git commit -m "DecisionTreeClassifier"
git push --all
dvc push -r origin

We have quite balanced results, and we will be improving them by using the ensemble model.

Experiment 3

In the third experiment, we will be changing our model to Random Forest Classifier and then run the entire process again. By now we are experts are running experiments and pushing it to DAGsHub.

python main.py
dvc commit -f Model.dvc Submission.dvc
git add Model.dvc Submission.dvc main.py metrics.csv params.yml
git commit -m "RandomForestClassifier"
git push --all
dvc push -r origin

After pushing our third experiment we will be comparing all three results by selecting all three commits and clicking on compare button as shown below.

Tip!

Press p or to see the previous file or, n or to see the next file

About

A Complete Guide to DVC and DAGsHub

Collaborators 1

Comments

Loading...