Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

README.md 1.1 KB

You have to be logged in to leave a comment. Sign In

dh-tutorial

Import dataset

dvc import-url https://dagshub-public.s3.us-east-2.amazonaws.com/tutorials/stackexchange/CrossValidated-Questions-Nov-2020.csv data/CrossValidated-Questions.csv

Define remote DVC repo

dvc remote add origin https://dagshub.com/martin-fabbri/dh-tutorial.dvc dvc remote modify origin --local auth basic dvc remote modify origin --local user martin-fabbri dvc remote modify origin --local password "$DAGSHUB_PASS" dvc push -r origin --all-commits

dvc remote add origin "https://dagshub.com/$DAGSHUB_USER/$DAGSHUB_REPO.dvc" dvc remote default origin --local dvc remote modify origin --local user "$DAGSHUB_USER" dvc remote modify origin --local auth basic dvc remote modify origin --local password "$DAGSHUB_PASS"

Define pipeline stages

Split

dvc run -n split -d data/CrossValidated-Questions.csv -d src/main.py -o data/test.csv.zip -o data/train.csv.zip -p seed python3 src/main.py split

Train

dvc run -n train -d data/test.csv.zip -d data/train.csv.zip -d src/main.py -o outputs/model.joblib -o outputs/tfidf.joblib -p max_features -M metrics/eval.json python3 src/main.py train

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...