README.md

link bug removed

2 years ago

You have to be logged in to leave a comment.

Voice Gender Detection

1. General information

Cleaned Dataset for Voice gender detection using the VoxCeleb dataset (7000+ unique speakers and utterances, 3683 males / 2312 females). The VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube. VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages.

The similar version of dataset is uploaded to DagsHub, enabling you to preview the dataset before downloading it.

2. Data Preprocessing

The author have downloaded all the files from VoxCeleb2. After this, he cleaned the data to separate all the males from the females. I took one voice file at random for all the males and females so as to provide unique files.

To prepare the dataset, He put the 'males' and 'females' folders in the data directory of this repository. This will allow for us to featurize the files and train machine learning models via the provided training scripts.

3. Audio File Conversion

The original files that I downloaded were in .m4a format which is not detectable by DAGsHub audio visualization, so I used Python script to convert m4a files to wav files (github.com) to convert my dataset into .wav format. I ran code for the males and females folder separately.

4. Organization of the dataset

The dataset is large (1.26GB) and simple to navigate as it has 2 folders based on binary gender. Males folder contains 3682 .wav audio files from unique speakers all over the world. Similar to the males folder we have females fold containing 2312 .wav files of unique females speakers. The audio duration range from 5~30 seconds to approximately 194 KB size. The following ASCII diagram depicts the directory structure.

<root directory>
    |
    .- README.md
    |
    .- fileconvert.py
    |
    .- females/
    |
    .- males/
          |
          .- 0.wav
          |
          .- 1.wav
          |
          .- 2.wav
          | ...

5. Use Case & Results

The dataset is used to train a machine learning model to detect males from females from audio files (90.7% +/- 1.3% accuracy). You can find more about code and results here.

Decision tree accuracy (+/-) 0.007327676542764603
0.7398596519424567
Gaussian NB accuracy (+/-) 0.016660391044338484
0.8682797740896762
SKlearn classifier accuracy (+/-) 0.00079538963465451
0.5157270607408913
Adaboost classifier accuracy (+/-) 0.013940745120583124
0.8892763651333413
Gradient boosting accuracy (+/-) 0.01950292233912751
0.8669747415791165
Logistic regression accuracy (+/-) 0.012678238150779661
0.894515837971657
Hard voting accuracy (+/-) 0.013226860908589952
0.9076178049591996
K Nearest Neighbors accuracy (+/-) 0.017244722910655787
0.731352177051436
Random forest accuracy (+/-) 0.02258623279374182
0.8079923672086033
svm accuracy (+/-) 0.022841304608332974
0.8781480823563248
most accurate classifier is Hard Voting with audio features (mfcc coefficients).

Acknowledgments

First, I would like to thank Jim Schwoebel for publishing dataset on GitHub and explaining in depth how to use this dataset. Secondly, I would like to thank VoxCeleb for providing amazing open source dataset.

The VoxCeleb is supported by the EPSRC programme grant Seebibyte EP/M013774/1: Visual Search for the Era of Big Data.

License

The VoxCeleb dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License.
Voice Gender Detection dataset is under Apache License Version 2.0 by Jim Schwoebel, 2020

Jim Schwoebel, Aug 8, 2020

Original Dataset: Voice Gender Detection

DAGsHub Dataset: kingabzpro/voice_gender_detection

This open source contribution is part of DagsHub x Hacktoberfest

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

Voice Gender Detection

1. General information

2. Data Preprocessing

3. Audio File Conversion

4. Organization of the dataset

5. Use Case & Results

Acknowledgments

License

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub / audio-datasets connected to https://github.com/DAGsHub/audio-datasets.git

README.md

Voice Gender Detection

1. General information

2. Data Preprocessing

3. Audio File Conversion

4. Organization of the dataset

5. Use Case & Results

Acknowledgments

License

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

DagsHub
/
audio-datasets
connected to https://github.com/DAGsHub/audio-datasets.git