Are you sure you want to delete this access key?
This repository contains the description and code for setting up DVC to use a remote computer server using dask. Note that this use case relay on the original DVC tutorial and its code found here https://dvc.org/doc/tutorial.
The use case have the following prerequisites:
/scratch/dvc_data_cache/
./scratch/dvc_users/[REMOTE_USERNAME]/
.ssh-keygen
), which have been copied to the remote server: ssh-copy-id [REMOTE_USERNAME]@[REMOTE_IP]
.ssh -L 8786:[REMOTE_USERNAME]@[REMOTE_IP]:8786 [REMOTE_USERNAME]@[REMOTE_IP]
.git clone git@github.com:<GITHUB_USERNAME>/dvc.git
cd dvc
conda create -n py36_open_source_dvc python=3.6
conda activate py36_open_source_dvc
pip install -r requirements.txt
pip install -r tests/requirements.txt
pip install -e .
pip install pre-commit
pre-commit install
which dvc
should say [HOME]/anaconda3/envs/py36_open_source_dvc/bin/dvc
and dvc --version
should say the exact version available in you local DVC development repository.conda activate py36_open_source_dvc
dvc remote add ahsoka ssh://[REMOTE_IP]/scratch/dvc_users/[REMOTE_USERNAME]/ --global
dvc remote modify ahsoka user [REMOTE_USERNAME] --global
dvc remote modify ahsoka port 22 --global
dvc remote modify ahsoka keyfile [PATH_TO_YOUR_PUBLIC_SSH_KEY] --global
dvc remote add ahsoka_cache ssh://[REMOTE_IP]/scratch/dvc_data_cache --global
dvc remote modify ahsoka_cache user [REMOTE_USERNAME] --global
dvc remote modify ahsoka_cache port 22 --global
dvc remote modify ahsoka_cache keyfile [PATH_TO_YOUR_PUBLIC_SSH_KEY] --global
dvc config cache.ssh ahsoka_cache --global
This use case of DVC and Dask has been set up as follow.
On your remote server do the following:
cd scratch/dvc_users/[REMOTE_USERNAME]
mkdir dvc_dask_use_case
wget -P ./ https://s3-us-west-2.amazonaws.com/dvc-share/so/100K/Posts.xml.tgz
tar zxf ./Posts.xml.tgz -C ./
On your local machine do the following:
git clone git@github.com:PeterFogh/dvc_dask_use_case.git
conda env create -f conda_env.yml
, which have been create by the following commands (executed the 16-03-2019):
conda create --name py36_open_source_dvc_dask_use_case --clone py36_open_source_dvc
conda install -n py36_open_source_dvc_dask_use_case dask scikit-learn
conda env export -n py36_open_source_dvc_dask_use_case > conda_env.yml
conda activate py36_open_source_dvc && which dvc && dvc --version
and conda activate py36_open_source_dvc_dask_use_case && which dvc && dvc --version
dvc repro
- which have been specified by the following DVC stages:
conda activate py36_open_source_dvc_dask_use_case
dvc run -d xml_to_tsv.py -d conf.py -d remote://ahsoka/dvc_dask_use_case/Posts.xml -o remote://ahsoka/dvc_dask_use_case/Posts.tsv -f xml_to_tsv.dvc python xml_to_tsv.py
dvc run -d split_train_test.py -d conf.py -d remote://ahsoka/dvc_dask_use_case/Posts.tsv -o remote://ahsoka/dvc_dask_use_case/Posts-test.tsv -o remote://ahsoka/dvc_dask_use_case/Posts-train.tsv -f split_train_test.dvc python split_train_test.py
dvc run -d featurization.py -d conf.py -d remote://ahsoka/dvc_dask_use_case/Posts-train.tsv -d remote://ahsoka/dvc_dask_use_case/Posts-test.tsv -o remote://ahsoka/dvc_dask_use_case/matrix-train.p -o remote://ahsoka/dvc_dask_use_case/matrix-test.p -f featurization.dvc python featurization.py
dvc run -d train_model.py -d conf.py -d remote://ahsoka/dvc_dask_use_case/matrix-train.p -o remote://ahsoka/dvc_dask_use_case/model.p -f train_model.dvc python train_model.py
dvc run -d evaluate.py -d conf.py -d remote://ahsoka/dvc_dask_use_case/model.p -d remote://ahsoka/dvc_dask_use_case/matrix-test.p -m eval.txt -f Dvcfile python evaluate.py
dvc metrics show -a
.Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?