Are you sure you want to delete this access key?
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
This repository holds a sample application that bakeoff participants will implement in their respective frameworks.
We have three question/answer datasets from Indeed, as described below:
original.8.15.2019 | original.5.26.2020 | original.8.10.2020 | |
---|---|---|---|
no of questions | 65381 | 208559 | |
no of answers | 321287 | 919656 |
All datasets are on server-san: /mnt/data/dataset/indeed_qa/{original.8.15.2019 | original.5.26.2020 | original.8.10.2020}
directories. To use these datasets in the project you can copy them into the <repository>/data
directory. Datasets in original.8.15.2019 and original.5.26.2020 are in CSV format whereas original.8.10.2020 contains a JSON file for each question with its answers.
Setup python to use 3.7.4 and create a virtual env for bakeoff-sample
cd scripts
pyenv local 3.7.4
pyenv virtualenv 3.7.4 bakeoff-sample
Install the required libraries in the bakeoff-sample virtualenv:
cd scripts
pip install -r requirements.txt
The following command will run TextRank on questions containing keywords interview
and process
, output a summary with at most 5 sentences and 100 words to the output directory in data/output
where each json file will contain summarizations.
cd scripts
python run_textrank.py \
--keywords interview,process \
--num_sentences 5 \
--num_words 100 \
Run TextRank on a specific topic(s):
python run_textrank.py \
--topics DRESS_CODE \
--num_sentences 5 \
--num_words 100 \
Run TextRank on a specific questions:
python run_textrank.py \
--questions <questionids> \
--num_sentences 5 \
--num_words 100 \
Each summarization json file has the following format:
{
"question_id": "...",
"question_text": "...?",
"question_code": "...",
"question_topics": ["...",..."]
"company_id": 123,
"company": "...",
"num_answers": 7,
"summary": [{
"text": "...",
"support": 2,
"coverage": [{
"answer_id": "...",
"answer_text": "...",
"similarity": 0.9,
"matching_tokens": [{
"token": "...",
"spans": [ {
"text": "...",
"start": 0,
"end": 8
}, ...]
}, ...]
}, {
"answer_id": "...",
"answer_text": "..."
}],
"tokens": ["...", "...", ...]
}, ...],
"stats": {
"ROUGE-1": 0.9937106868240972,
"ROUGE-2": 0.9852216698769686,
"ROUGE-L": 0.9936320879085392
}
}
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?