Big Old Heuristic Repository

Hlib e542bc9932 add dvc outputs and caches 4 months ago
.dvc 380c2380fc change path to ironspeed 4 months ago
.github a983abeaea Add support for loading issues and files from external files (#31) 4 months ago
.idea ef00c4a117 update .travis to use python 3.8 (#27) 4 months ago
bohr cfd092747c possible fix of possible files dataframe bug (#39) 4 months ago
data fcfb2a4731 store zip in dvc cache instead of unzipped files 4 months ago
downloaded-data a983abeaea Add support for loading issues and files from external files (#31) 4 months ago
generated e542bc9932 add dvc outputs and caches 4 months ago
metrics e542bc9932 add dvc outputs and caches 4 months ago
.gitignore a983abeaea Add support for loading issues and files from external files (#31) 4 months ago
.travis.yml ef00c4a117 update .travis to use python 3.8 (#27) 4 months ago
README.rst af1299d618 Update README.rst 4 months ago
dvc.lock e542bc9932 add dvc outputs and caches 4 months ago
dvc.yaml a983abeaea Add support for loading issues and files from external files (#31) 4 months ago
heuristic_metrics.json e542bc9932 add dvc outputs and caches 4 months ago
label_model_metrics.json 80cc6772f7 run dvc repro and update stats after breaking down test dataset into 3 datasets 4 months ago
labeled_with_model2.csv
requirements.txt c7fdfaf034 Reorganize heuristics (#30) 4 months ago

Data Pipeline

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.rst

BOHR
----------------------------------
Big Old Heuristic Repository

Getting started
===========================================

Install Anaconda/Miniconda
~~~~~~~~~~~~~~~~~~~~~~~~~~~
#. Install conda_ [Skip this step if you already have conda installed]
#. Create a virtual environment ``conda create --name python==3.8.0``
#. Activate virtual environment ``conda activate ``

Get started with BOHR
~~~~~~~~~~~~~~~~~~~~~~~~~~~
#. Run ``git clone https://github.com/giganticode/bohr && cd bohr``
#. Run ``pip install -r requirements.txt`` (Python 3.8 or higher is required)

Running the code and reproducing the models
===========================================

Using DVC (Data Version Control) - preferred
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#. Install dvc_ for your OS

#. Install p7zip_ for your OS

#. Setting up datasource. Ironspeed users should create a file ``.dvc/config.local``. Dvc will check this file to know where datasets should be fetched from on the next step. It contains sensitive data, must not be committed, and is gitignored by default. The contents of the file should be the following::

[core]
remote = ironspeed
['remote "ironspeed"']
url = ssh://10.10.20.160/home/hbabii/.dvcstorage/bohr
user =
password =

#. Run ``dvc pull -r ironspeed data/test downloaded-data/b_b.7z``

#. Run ``dvc repro``

.. _dvc: https://dvc.org/doc/install
.. _p7zip: https://www.7-zip.org
.. _conda: https://docs.anaconda.com/anaconda/install/

Without DVC
~~~~~~~~~~~
TBA



EDIT TEXT BELOW WITH UPDATED STEPS >>>

Contribute to the project by adding your first heuristic:
===========================================================

#. Define a function inside the ``bohr/heuristics/templates.py`` file and label it with @labeling_function() decorator. This heuristic can be reused later for different tasks.

#. To use the newly created function to label commits for a specific task, add it to the ``heuristics`` list defined in ``bohr/heuristics//.py`` (e.g., for the task of classification of bugfix commit for a bugfix commit, the file is ``bohr/heuristics/bugs/bugs.py``)

#. Run ``dvc repro`` to recalculate the metrics or (if not using DVC) rerun the scripts in ``bohr/pipeline`` package manually

#. Commit and push the changes together with the changed metric files.

See this commit_ as an example of how a heuristic can be added.

.. _commit : https://github.com/giganticode/bohr/commit/6928dfd750d304ca4610dbba4216f6e94375e4a7

Credits
=======

This project is based on the work of `@lalenzos `_ and partially uses his code.