Are you sure you want to delete this access key?
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Replication package for MSR'24 Mining Challenge
https://2024.msrconf.org/track/msr-2024-mining-challenge
You can set up the environment for using this project, following
the recommended practices (described in later part of this document),
by running the init.bash
Bash script, and following
its instructions.
Note that this script assumes that it is run on Linux, or Linux-like system. For other operating systems, you are probably better following the steps described in this document manually.
To avoid dependency conflicts, it is strongly recommended to create a virtual environment, for example with:
python3 -m venv venv
This needs to be done only once, from top directory of the project.For each session, you should activate the environment:
source venv/bin/activate
This would make command line prompt include "(venv) " as prefix, thought it depends on the shell used.
Using virtual environment, either directly like shown above, or
by using pipx
, might be required if you cannot install system
packages, but Python is configured in a very specific way:
error: externally-managed-environment
× This environment is externally managed
You can install dependencies defined in requirements.txt file
with pip
using the following command:
python -m pip install -r requirements.txt
Note: the above assumes that you have activated virtual environment (venv).
You can re-run whole computation pipeline with dvc repro
, or at least
those parts that were made to use DVC (Data Version Control) tool.
You can also run experiments with dvc exp run
.
Because the initial external DevGPT dataset is quite large (it is 650 MB as *.zip file, and 3.9 GB uncompressed into directory), you might want to store DVC cache in some other place than your home repository.
You can do that with dvc cache dir
command:
dvc cache dir --local /mnt/data/username/.dvc/cache
where you need to replace username
with your login (on Linux you can
find it with the help of whoami
command).
To avoid recomputing results, which takes time, you can configure local dvc remote storage, for example:
cat <<EOF >>.dvc/config.local
[core]
remote = local
['remote "local"']
url = /mnt/data/dvcstore
EOF
Then you would be able to download computed data with dvc pull
,
and upload your results for others in the team with dvc push
.
This assumes that you all have access to /mnt/data/dvcstore
,
either via doing the work on the same host (perhaps remotely),
or it is network storage available for all people in the team.
Press p or to see the previous file or, n or to see the next file
Code for MSR'24 Mining Challenge paper: https://2024.msrconf.org/track/msr-2024-mining-challenge "How I Learned to Stop Worrying and Love ChatGPT"
https://2024.msrconf.org/details/msr-2024-mining-challenge/6Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?