Are you sure you want to delete this access key?
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
This repository is responsible for versioning of the required source code to generate the COVID-19 manuscripts dataset, which it was published in the DiB (Data in Brief) journal.
The raw data collected by the Jupyter Notebooks, which are contained into the folder "notebooks/collect".
The data sources are:
The final dataset is combination of the arXiv, bioRxiv, medRxiv, PubMed and Scopus datasets collected. It is generated by the use of DVC pipeline defined in this repository.
The features of the resulting dataset are:
For the execution of the following steps, I will consider that you already cloned/downloaded this repository, as well as the steps will be executed via shell/prompt within the folder of this repository. In addition, an essential prerequisite is that DVC is already installed on your machine.
For reusing the raw data that I already collected and the pipeline created, you can do the following steps:
Download the raw data, that is available on Google Drive, and put them in the data/raw folder. You can download these files from this link.
Execute the preprocessing pipeline. So, you can execute the following command:
dvc repro
Santos, Breno Santana; Silva, Ivanovitch; Ribeiro-Dantas, Marcel da Câmara; Alves, Gisliany; Endo, Patricia Takako; Lima, Luciana. COVID-19: A scholarly production dataset report for research analysis. Data in Brief, Volume 32, 2020, DOI:10.1016/j.dib.2020.106178.
You can download the article from this link.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?