Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
e6b6521e44
Configured the DVC remote
3 years ago
48c24cdf8c
repro pipeline
3 years ago
48c24cdf8c
repro pipeline
3 years ago
48c24cdf8c
repro pipeline
3 years ago
src
f87a9e9e32
Added experiment logging
3 years ago
9dbc3ad01d
Initialized project
3 years ago
48c24cdf8c
repro pipeline
3 years ago
af52e5e3be
Initial commit
3 years ago
9dbc3ad01d
Initialized project
3 years ago
f87a9e9e32
Added experiment logging
3 years ago
f87a9e9e32
Added experiment logging
3 years ago
48c24cdf8c
repro pipeline
3 years ago
f87a9e9e32
Added experiment logging
3 years ago
48c24cdf8c
repro pipeline
3 years ago
f87a9e9e32
Added experiment logging
3 years ago
ab2f1dc947
Training script with outputs
3 years ago
f87a9e9e32
Added experiment logging
3 years ago
9dbc3ad01d
Initialized project
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

dh-tutorial

Import dataset

dvc import-url https://dagshub-public.s3.us-east-2.amazonaws.com/tutorials/stackexchange/CrossValidated-Questions-Nov-2020.csv data/CrossValidated-Questions.csv

Define remote DVC repo

dvc remote add origin https://dagshub.com/martin-fabbri/dh-tutorial.dvc dvc remote modify origin --local auth basic dvc remote modify origin --local user martin-fabbri dvc remote modify origin --local password "$DAGSHUB_PASS" dvc push -r origin --all-commits

dvc remote add origin "https://dagshub.com/$DAGSHUB_USER/$DAGSHUB_REPO.dvc" dvc remote default origin --local dvc remote modify origin --local user "$DAGSHUB_USER" dvc remote modify origin --local auth basic dvc remote modify origin --local password "$DAGSHUB_PASS"

Define pipeline stages

Split

dvc run -n split -d data/CrossValidated-Questions.csv -d src/main.py -o data/test.csv.zip -o data/train.csv.zip -p seed python3 src/main.py split

Train

dvc run -n train -d data/test.csv.zip -d data/train.csv.zip -d src/main.py -o outputs/model.joblib -o outputs/tfidf.joblib -p max_features -M metrics/eval.json python3 src/main.py train

Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...