Are you sure you want to delete this access key?
This repository contains the description and code for setting up DVC to use a remote computer server using dask. Note that this use case relay on the original DVC tutorial and its code found here https://dvc.org/doc/tutorial.
The use case have the following prerequisites:
/scratch/dvc_project_cache/
./scratch/dvc_users/[REMOTE_USERNAME]/
.mlflow server --host 0.0.0.0 --file-store /projects/mlflow_runs/
.ssh-keygen
), which have been copied to the remote server, with ssh-copy-id [REMOTE_USERNAME]@[REMOTE_IP]
.ssh -L 8786:[REMOTE_USERNAME]@[REMOTE_IP]:8786 -L 5000:[REMOTE_USERNAME]@[REMOTE_IP]:5000 [REMOTE_USERNAME]@[REMOTE_IP]
.git clone git@github.com:<GITHUB_USERNAME>/dvc.git
cd dvc
conda create -n py36_open_source_dvc python=3.6
conda activate py36_open_source_dvc
pip install -r requirements.txt
pip install -r tests/requirements.txt
pip install -e .
pip install pre-commit
pre-commit install
which dvc
should say [HOME]/anaconda3/envs/py36_open_source_dvc/bin/dvc
and dvc --version
should say the exact version available in you local DVC development repository.--global
flag) for you local machine - note that I call my remote server "ahsoka":
conda activate py36_open_source_dvc
dvc remote add ahsoka ssh://[REMOTE_IP]/ --global
dvc remote modify ahsoka user [REMOTE_USERNAME] --global
dvc remote modify ahsoka port 22 --global
dvc remote modify ahsoka keyfile [PATH_TO_YOUR_PUBLIC_SSH_KEY] --global
dvc remote add ahsoka_user_workspace remote://ahsoka/scratch/dvc_users/[REMOTE_USERNAME]/ --global
.dvc/config
, to specify project specific remotes for the DVC cache and DVC data workspace.This use case of DVC and Dask has been set up as follow.
On your remote server do the following:
cd scratch/dvc_users/[REMOTE_USERNAME]
mkdir dvc_dask_use_case
cd dvc_dask_use_case
wget -P ./ https://s3-us-west-2.amazonaws.com/dvc-share/so/100K/Posts.xml.tgz
tar zxf ./Posts.xml.tgz -C ./
On your local machine do the following:
git clone git@github.com:PeterFogh/dvc_dask_use_case.git
conda env create -f conda_env.yml
, which have been create by the following commands (executed the 26-04-2019):
conda create --name py36_open_source_dvc_dask_use_case --clone py36_open_source_dvc
conda install -n py36_open_source_dvc_dask_use_case dask scikit-learn
conda activate py36_open_source_dvc_dask_use_case && pip install mlflow matplotlib
conda env export -n py36_open_source_dvc_dask_use_case > conda_env.yml
conda activate py36_open_source_dvc && which dvc && dvc --version
and conda activate py36_open_source_dvc_dask_use_case && which dvc && dvc --version
dvc repro
- which have been specified by the following DVC stages:
conda activate py36_open_source_dvc_dask_use_case
dvc run -d xml_to_tsv.py -d conf.py -d remote://ahsoka_project_data/Posts.xml -o remote://ahsoka_project_data/Posts.tsv -f xml_to_tsv.dvc python xml_to_tsv.py
dvc run -d split_train_test.py -d conf.py -d remote://ahsoka_project_data/Posts.tsv -o remote://ahsoka_project_data/Posts-test.tsv -o remote://ahsoka_project_data/Posts-train.tsv -f split_train_test.dvc python split_train_test.py
dvc run -d featurization.py -d conf.py -d remote://ahsoka_project_data/Posts-train.tsv -d remote://ahsoka_project_data/Posts-test.tsv -o remote://ahsoka_project_data/matrix-train.p -o remote://ahsoka_project_data/matrix-test.p -f featurization.dvc python featurization.py
dvc run -d train_model.py -d conf.py -d remote://ahsoka_project_data/matrix-train.p -o remote://ahsoka_project_data/model.p -f train_model.dvc python train_model.py
dvc run -d evaluate.py -d conf.py -d remote://ahsoka_project_data/model.p -d remote://ahsoka_project_data/matrix-test.p -m eval.txt -f Dvcfile python evaluate.py
dvc metrics show -a
.mlflow.log_artifacts()
do not support files saved on the remote server. Artifact files must be located at a directory shared by both the client machine and the server using the methods described here. Read https://github.com/mlflow/mlflow/issues/572#issuecomment-427718078 for more details on the problem. However, we can circumvent this problem using Dask to executed the MLflow run on the remote server. Thereby, both the client and the MLflow tracking server has not problem reading and writing to the same folder, as the they are executed on the same machine.Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?