1 Branches

.dvc

aa4d264533

init DVC

6 years ago

code

b59d4a28ad

Merge bigrams into the tuned model

6 years ago

data

b59d4a28ad

Merge bigrams into the tuned model

6 years ago

.gitignore

47859c598d

Process to TSV and separate test and training data

6 years ago

Dvcfile

b59d4a28ad

Merge bigrams into the tuned model

6 years ago

Posts-test.tsv.dvc

47859c598d

Process to TSV and separate test and training data

6 years ago

Posts.tsv.dvc

47859c598d

Process to TSV and separate test and training data

6 years ago

Posts.xml.dvc

516b05276e

extract data

6 years ago

README.md

6702ad3940

merge train_bigram into master

6 years ago

matrix-train.p.dvc

735e5cedc1

Bigrams

6 years ago

model.p.dvc

b59d4a28ad

Merge bigrams into the tuned model

6 years ago

DagsHub Storage

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

You have to be logged in to leave a comment.

DVC Tutorial

A repo for the dvc tutorial shown on DVC.org

Step 0

Initial git commit. Here we have downloaded the code from the DVC site

Step 1

Initialized DVC, and added a virtual environment

Step 2

Retrieved example data which is about 41Mb in size. Because of how DVC works, this data is not commited to the git repo, but instead exists in the DVC cache.

Step 3

Unzipped the data file. According to the command given, DVC knows to automatically add the unzipped data file to the .gitignore and the .dvc/cache

Step 4

Performed XML to TSV and performed the data train/test split. These are two consecutive steps of the data pipeline. This step goes to show that you can perform multiple steps of the pipeline before commiting without any problems

Step 5

Peformed the following DVC steps - Featurization, Training and model evaluation.For the final step we create an eval.txt file which includes an AUC metric for measuring the performance of the model.

Outside of the original DVC tutorial we have created this file as a metric using the flag -M instead of the -o flag appearing in the original tutorial.

Step 6

Created a new branch called bigram. Like it's name, we have tried to use bigrams (features extract from word pairs) additionally to the unigrams (single word features) used earlier.

This step is performed in order to try and improve our AUC metric. It has indeed improved, but by a very small amount, which is not so exciting. We are logging this relatively unsuccessful attempt nontheless.

Step 7

We have now created a new branch called tuning, which aims to improve the model performance on the AUC metric by changing the parameters of the random forest classifier used in this project. Here we have changed the number of estimatiors to be 700 (and increase of 600) and the number of jobs to 6 (an increase of 4).

After performing the DVC repro command we acheive a model with ~0.64 AUC which is a decent improvement.

Step 8

Here we try to combine the modifications in the former 2 stages in order to get another improvement. In this case the metric has, in fact, not improved and the AUC is now ~0.638 as opposed to the case shown in the original tutorial where a small improvement had been made. Nonetheless we continue the flow of the tutorial and will perform the next steps as if it had improved.

Step 9 - Current Step

We have now merged our "improved" model with the original branch.

Tip!

Press p or to see the previous file or, n or to see the next file

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

README.md

DVC Tutorial

A repo for the dvc tutorial shown on DVC.org

Step 0

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9 - Current Step

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Dean / DVC-Tutorial

README.md

DVC Tutorial

A repo for the dvc tutorial shown on DVC.org

Step 0

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9 - Current Step

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Dean
/
DVC-Tutorial