Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

README.md 2.9 KB

You have to be logged in to leave a comment. Sign In

DVC Demo Project

Demo project to test out DVC and DAGsHub

See this repository on DAGsHub and GitHub

Data Version Control (DVC) is a version control system built around the machine learning workflow. It allows you to build and run pipelines, represented as a Directed Acyclic (dependency) Graphs, with data and code, tracking large outputs using Git-controlled metafiles. DAGsHub is a fully-featured Git and DVC remote, i.e. DAGsHub is to DVC as GitHub is to Git.

This repository implements a binary classifier on questions from CrossValidated Stack Exchange to determine if they are about machine learning or not. The machine learning portion of this repository is unremarkable and uses standard techniques. The python file main.py contains code for all steps of the ML pipeline.

Usage: python3 main.py [split|featurize|tfidf|train|test]

View Experiment Log

The following command transforms the experiment log outputted by DVC into a human-readable format. It pipes the raw JSON outputted by DVC into a jq program to transform it into a TSV (tab-seperated value) of metrics. The TSV is then piped to column to pretty-print it.

$ dvc metrics show --show-json --all-commits | jq -r '(["ID", "Train Accuracy", "Test Accuracy", "Train ROC AUC", "Test ROC AUC"] | ., map("=============")), (to_entries[] as {key: $id, value: {"metrics-train.yaml": $train, "metrics-test.yaml": $test}} | [$id[:10], $train.accuracy, $test.accuracy, $train.roc_auc, $test.roc_auc]) | @tsv' | column -tns$'\t'

Output:

ID              Train Accuracy      Test Accuracy   Train ROC AUC       Test ROC AUC
==============  ==============      ==============  ==============      ==============
workspace       0.9192533333333334  0.89608         0.9546657196819067  0.8611241864339614
8dbc9e261a      0.9192533333333334  0.89608         0.9546657196819067  0.8611241864339614
898fc3ba3a      0.9192533333333334  0.89608         0.9546657196819067  0.8611241864339614
1296fd461c      0.9185066666666667  0.8948          0.954678369966714   0.8550331715537685
75372d83ab      0.9195466666666666  0.89552         0.9541852826125495  0.8629513709508426
6a7e1866a3      0.9181866666666667  0.8964          0.9551873097990777  0.8595586812709163
ccceac7028      0.9192533333333334  0.89608         0.9546657196819067  0.8611241864339614

DAGsHub Features

Experiment Tracker

Pipeline DAG

DVC-tracked folder view (outputs/)

Credits

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...