Are you sure you want to delete this access key?
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Project Coswara by Indian Institute of Science (IISc) Bangalore is an attempt to build a diagnostic tool for COVID-19 detection using the audio recordings such as breathing, cough and speech sounds of an individual. Currently, the project is in the data collection stage through crowdsourcing. To contribute your audio samples, please go to Project Coswara(https://coswara.iisc.ac.in/). The exercise takes 5-7 minutes.
What am I looking at? This github repository contains the raw audio data collected through https://coswara.iisc.ac.in/ . Every participant contributes nine sound samples. You can read the paper: Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis to know more about the dataset. Note that the dataset size has increased since this paper came out. We also maintain a (less frequently updated) blog here.
What is the structure of the repository?
Each folder contains metadata and audio recordings corresponding to contributors. The folder is compressed. To download and extract the data, you can run the script extract_data.py
What are the different sound samples? Sound samples collected include breathing sounds (fast and slow), cough sounds (deep and shallow), phonation of sustained vowels (/a/ as in made, /i/,/o/), and counting numbers at slow and fast pace. Metadata information collected includes the participant's age, gender, location (country, state/ province), current health status (healthy/ exposed/ positive/recovered) and the presence of comorbidities (pre-existing medical conditions).
Can I see the metadata before downloading whole repository?
Yes. The file combined_data.csv
contains a summary of metadata. The file csv_labels_legend.json
contains information about the columns present in combined_data.csv
.
How to cite this dataset in your work? Great to know you found it useful. You can cite the paper: Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis (https://arxiv.org/abs/2005.10548)
Each folder also has a CSV file which contains metadata of each sample (that is, participant).
The word Coswara is an amalgamation of Co (from corona) and Swara (sound in sanskrit). The project is being pursued in three stages:
We aim at creating a dataset composed of voice samples from healthy individuals, and those with COVID-19 infection. The data is collected using a web and mobile application. Voice samples collected include breathing sounds (fast and slow), cough sounds (deep and shallow), phonation of sustained vowels (/a/ as in made, /i/,/o/), and counting numbers at slow and fast pace. Metadata information collected includes the participant's age, gender, location (country, state/ province), current health status (healthy/ exposed/ cured/ infected) and the presence of comorbidities (pre-existing medical conditions).
No personally identifiable information is collected, and the data collection respects the privacy of the contributors. The data is also anonymized during storage itself. All the collected data will be available updated here on a daily basis.
The collected data will be analysed using signal processing and machine learning techniques. The goal is to build mathematical models aiding identification of ‘infection prints’ from voice samples. This stage is a work-in-progress while we create the dataset.
We also aim at releasing the collected dataset in a structured form openly via a Github platform. This is to pool effort from the larger research community to contribute in making point-of-care diagnosis a reality soon.
We aim to release the diagnosis tool as a web/mobile application. Similar to the dataset creation stage, the application requests for recording the voice samples, and preferably provides a score indicating the probability of COVID-19 infection. The final deployment of the tool is subject to validation with clinical findings, and authorization/approval from competent authorities.
Given the highly simplistic and cost effective nature of this diagnosis approach, we hypothesize that even a partial success of the tool would enable a massive deployment as a first line of diagnosis for the pandemic. The potential diagnostic tool will not replace chemical testing but merely supplement the existing testing methods.
Yes, we are a team of Professors, PostDocs, Engineers, and Research Scholars affiliated with the Indian Institute of Science, Bangalore (India). Sriram Ganapathy, Assistant Professor, Dept. Electrical Engineering, IISc is the Principal Investigator of this project.
Current Members: Debarpan Bhattachrya, Debottam Dutta, Neeraj Kumar Sharma, Prasanta Kumar Ghosh, Srikanth Raj Chetupalli, Sriram Ganapathy
Past Members: Anand Mohan, Ananya Muguli, Prashant Krishnan, Rohit Kumar, Shreyas Ramoji
The DAGshub repository is here This open source contribution is part of DagsHub x Hacktoberfest
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?