Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
General:  dagshub Integration:  dvc git github
Arjun Vikram 42dfa0e2be
Update README.md to include experiment log command
3 years ago
1b73f685f8
Push dvc to DAGsHub
3 years ago
8dbc9e261a
Update README.md
3 years ago
3cf662ac39
dvc init
3 years ago
7c4c6fb410
Tried new SGDClassifier algorithm
3 years ago
695e8dd553
Modify main.py to seperate training steps and extract parameters
3 years ago
695e8dd553
Modify main.py to seperate training steps and extract parameters
3 years ago
42dfa0e2be
Update README.md to include experiment log command
3 years ago
8d51143947
Added dvc-tracked folders
3 years ago
898fc3ba3a
Return to original parameters
3 years ago
ccceac7028
Move metrics files to being tracked by Git
3 years ago
e08a41766a
Store model performance in DVC-tracked metrics
3 years ago
898fc3ba3a
Return to original parameters
3 years ago
898fc3ba3a
Return to original parameters
3 years ago
898fc3ba3a
Return to original parameters
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

DVC Demo Project

Demo project to test out DVC and DAGsHub

See this repository on DAGsHub and GitHub

Data Version Control (DVC) is a version control system built around the machine learning workflow. It allows you to build and run pipelines, represented as a Directed Acyclic (dependency) Graphs, with data and code, tracking large outputs using Git-controlled metafiles. DAGsHub is a fully-featured Git and DVC remote, i.e. DAGsHub is to DVC as GitHub is to Git.

This repository implements a binary classifier on questions from CrossValidated Stack Exchange to determine if they are about machine learning or not. The machine learning portion of this repository is unremarkable and uses standard techniques. The python file main.py contains code for all steps of the ML pipeline.

Usage: python3 main.py [split|featurize|tfidf|train|test]

View Experiment Log

The following command transforms the experiment log outputted by DVC into a human-readable format. It pipes the raw JSON outputted by DVC into a jq program to transform it into a TSV (tab-seperated value) of metrics. The TSV is then piped to column to pretty-print it.

$ dvc metrics show --show-json --all-commits | jq -r '(["ID", "Train Accuracy", "Test Accuracy", "Train ROC AUC", "Test ROC AUC"] | ., map("=============")), (to_entries[] as {key: $id, value: {"metrics-train.yaml": $train, "metrics-test.yaml": $test}} | [$id[:10], $train.accuracy, $test.accuracy, $train.roc_auc, $test.roc_auc]) | @tsv' | column -tns$'\t'

Output:

ID              Train Accuracy      Test Accuracy   Train ROC AUC       Test ROC AUC
==============  ==============      ==============  ==============      ==============
workspace       0.9192533333333334  0.89608         0.9546657196819067  0.8611241864339614
8dbc9e261a      0.9192533333333334  0.89608         0.9546657196819067  0.8611241864339614
898fc3ba3a      0.9192533333333334  0.89608         0.9546657196819067  0.8611241864339614
1296fd461c      0.9185066666666667  0.8948          0.954678369966714   0.8550331715537685
75372d83ab      0.9195466666666666  0.89552         0.9541852826125495  0.8629513709508426
6a7e1866a3      0.9181866666666667  0.8964          0.9551873097990777  0.8595586812709163
ccceac7028      0.9192533333333334  0.89608         0.9546657196819067  0.8611241864339614

DAGsHub Features

Experiment Tracker

Pipeline DAG

DVC-tracked folder view (outputs/)

Credits

Tip!

Press p or to see the previous file or, n or to see the next file

About

Demo repository to test out DVC and DAGsHub

https://github.com/arjvik/dvc-demo
Collaborators 1

Comments

Loading...