|Michael Pereira 15b305d5e6 added excess steps and orthogonal choice probability analysis to new data||3 weeks ago|
|data||3 weeks ago|
|models||2 months ago|
|src||3 weeks ago|
|.gitconfig||1 month ago|
|.gitignore||2 months ago|
|README.md||1 month ago|
|analysis_environment.yml||1 month ago|
This repository attempts to follow the relevant parts of the DAGsHub flavor of the directory structure convention proposed by cookiecutter-data-science (https://drivendata.github.io/cookiecutter-data-science/#directory-structure and https://dagshub.com/DAGsHub-Official/Cookiecutter-DVC). The standard will be incrementally implemented.
├── LICENSE ├── Makefile <- Makefile with commands like `make dirs` or `make clean` ├── README.md <- The top-level README for developers using this project. ├── data │ ├── interim <- Intermediate data that has been transformed. │ ├── processed <- The final, canonical data sets for modeling. │ ├─── raw <- The original, immutable data dump. │ └── discarded <- The data that can't be used because of acquisition issues. │ ├── eval.dvc <- The end of the data pipeline - evaluates the trained model on the test dataset. │ ├── models <- Trained and serialized models, model predictions, or model summaries │ ├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering), │ the creator's initials, and a short `-` delimited description, e.g. │ `1.0-jqp-initial-data-exploration`. │ ├── process_data.dvc <- Process the raw data and prepare it for training. ├── raw_data.dvc <- Keeps the raw data versioned. │ ├── references <- Data dictionaries, manuals, and all other explanatory materials. │ ├── reports <- Generated analysis as HTML, PDF, LaTeX, etc. │ └── figures <- Generated graphics and figures to be used in reporting │ └── metrics.txt <- Relevant metrics after evaluating the model. │ └── training_metrics.txt <- Relevant metrics from training the model. │ ├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g. │ generated with `pip freeze > requirements.txt` │ ├── setup.py <- makes project pip installable (pip install -e .) so src can be imported ├── src <- Source code for use in this project. │ ├── __init__.py <- Makes src a Python module │ │ │ ├── data <- Scripts to download or generate data │ │ └── make_dataset.py │ │ │ ├── models <- Scripts to train models and then use trained models to make │ │ │ predictions │ │ ├── predict_model.py │ │ └── train_model.py │ │ │ └── visualization <- Scripts to create exploratory and results oriented visualizations │ └── visualize.py │ ├── tox.ini <- tox file with settings for running tox; see tox.testrun.org └── train.dvc <- Traing a model on the processed data.
Git is great for versioning and distributing code but doesn't deal well with large files as the repositories start to get slow when a lot of data is checked into them (anything above a couple GB). Additionally, typically the central repository locations that are commonly used for git (github, gitlab, etc.) have strict storage limits that don't fit large data collections.
Git-annex is an add-on for git that improves this. It offers the following advantages:
The easiest way to install git-annex is with conda. There is an official package for git-annex that can be installed with
sudo apt-get install git-annex but that is too outdated.
conda create -n git-annex
conda activate git-annex
conda install git git-annex -c conda-forge(installs newest version of git-annex and also git (important) from the conda-forge repository)
sudo apt-get install network-manager-openconnect-gnome
Open terminal and browse to the directory where you want to download the data. This command will not download all the data but just an index of the available data files.
git clone https://dagshub.com/michaelfsp/bigmaze.git
git config --local include.path ../.gitconfig
git annex init "alice"(alice is used as an example description for the local repository. the only effect is to help the user identify all the clones of the repository. feel free to choose one that makes sense to you.)
git annex sync
git-annex offers the possibility of two clients connecting to each other using the tor network. This is handy because it lets clients connect even when they have changed IP address or can't get access to the target machine's local network.
Up-to-data information on how to install tor on Ubuntu can be found at https://support.torproject.org/apt/
sudo apt install apt-transport-https curl
sudo sh -c 'echo "deb https://deb.torproject.org/torproject.org/ bionic main" >> /etc/apt/sources.list.d/tor.list'
sudo sh -c 'echo "deb-src https://deb.torproject.org/torproject.org/ bionic main" >> /etc/apt/sources.list.d/tor.list'
curl https://deb.torproject.org/torproject.org/A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89.asc | gpg --import
gpg --export A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89 | sudo apt-key add -
sudo apt update
sudo apt install tor deb.torproject.org-keyring
git annex sync
git annex get /path/to/the/datafile/
It is necessary to add the path to MazeX and pyControl/tools to
$PYTHONPATH environment variable. The most appropriate way would be to add small bash scripts to the scripts which are automatically executed to activate and deactivate a conda environment. These files can be found in
One can create a file with name
pythonpath.sh in the
activate.d directory with the following content:
export PREV_PYTHONPATH=$PYTHONPATH export PYTHONPATH="/home/michael/code/mousemaze:/home/michael/src/pyControl/tools:$PYTHONPATH"
and a file with name
pythonpath.sh in the
deactivate.d directory with the following content: