Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Type:  dataset Data Domain:  audio
f0aeb4cbc9
ma4 to wav
2 years ago
bd995c4ced
initial project
2 years ago
bd995c4ced
initial project
2 years ago
24abf1f15a
Update 'LICENSE'
2 years ago
a872861005
README update
2 years ago
f0aeb4cbc9
ma4 to wav
2 years ago
f0aeb4cbc9
ma4 to wav
2 years ago
f0aeb4cbc9
ma4 to wav
2 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Voice Gender Detection

1. General information

Cleaned Dataset for Voice gender detection using the VoxCeleb dataset (7000+ unique speakers and utterances, 3683 males / 2312 females). The VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube. VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages.

The similar version of dataset is uploaded to DagsHub, enabling you to preview the dataset before downloading it.

2. Data Preprocessing

The author have downloaded all the files from VoxCeleb2. After this, he cleaned the data to separate all the males from the females. I took one voice file at random for all the males and females so as to provide unique files.

[img](https://github.com/jim-schwoebel/gender-detection/blob/master/data/Screen Shot 2019-07-22 at 11.16.14 AM.png)

To prepare the dataset, He put the 'males' and 'females' folders in the data directory of this repository. This will allow for us to featurize the files and train machine learning models via the provided training scripts.

[img](https://github.com/jim-schwoebel/gender-detection/blob/master/data/Screen Shot 2019-07-22 at 12.25.49 PM.png)

3. Audio File Conversion

The original files that I downloaded were in .m4a format which is not detectable by DAGsHub audio visualization, so I used Python script to convert m4a files to wav files (github.com) to convert my dataset into .wav format. I ran code for the males and females folder separately.

4. Organization of the dataset

The dataset is large (1.26GB) and simple to navigate as it has 2 folders based on binary gender. Males folder contains 3682 .wav audio files from unique speakers all over the world. Similar to the males folder we have females fold containing 2312 .wav files of unique females speakers. The audio duration range from 5~30 seconds to approximately 194 KB size. The following ASCII diagram depicts the directory structure.

<root directory>
    |
    .- README.md
    |
    .- fileconvert.py
    |
    .- females/
    |
    .- males/
          |
          .- 0.wav
          |
          .- 1.wav
          |
          .- 2.wav
          | ...

5. Use Case & Results

The dataset is used to train a machine learning model to detect males from females from audio files (90.7% +/- 1.3% accuracy). You can find more about code and results here.

Decision tree accuracy (+/-) 0.007327676542764603
0.7398596519424567
Gaussian NB accuracy (+/-) 0.016660391044338484
0.8682797740896762
SKlearn classifier accuracy (+/-) 0.00079538963465451
0.5157270607408913
Adaboost classifier accuracy (+/-) 0.013940745120583124
0.8892763651333413
Gradient boosting accuracy (+/-) 0.01950292233912751
0.8669747415791165
Logistic regression accuracy (+/-) 0.012678238150779661
0.894515837971657
Hard voting accuracy (+/-) 0.013226860908589952
0.9076178049591996
K Nearest Neighbors accuracy (+/-) 0.017244722910655787
0.731352177051436
Random forest accuracy (+/-) 0.02258623279374182
0.8079923672086033
svm accuracy (+/-) 0.022841304608332974
0.8781480823563248
most accurate classifier is Hard Voting with audio features (mfcc coefficients).

Acknowledgments

First, I would like to thank Jim Schwoebel for publishing dataset on GitHub and explaining in depth how to use this dataset. Secondly, I would like to thank VoxCeleb for providing amazing open source dataset.

The VoxCeleb is supported by the EPSRC programme grant Seebibyte EP/M013774/1: Visual Search for the Era of Big Data.

License


Jim Schwoebel, Aug 8, 2020

Original Dataset: Voice Gender Detection

Photo by Tim Mossholder on Unsplash

Tip!

Press p or to see the previous file or, n or to see the next file

About

Detect a person's gender from a voice file

Collaborators 5

Comments

Loading...