README.md

59f8431941

Update README.md

3 years ago

You have to be logged in to leave a comment.

Free Spoken Digit Dataset (FSDD)

A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.

FSDD is an open dataset, which means it will grow over time as data is contributed. In order to enable reproducibility and accurate citation the dataset is versioned using Zenodo DOI as well as git tags.

Current status

6 speakers
3,000 recordings (50 of each digit per speaker)
English pronunciations

Organization

Files are named in the following format: {digitLabel}_{speakerName}_{index}.wav Example: 7_jackson_32.wav

Metadata

metadata.py contains meta-data regarding the speakers gender and accents.

Included utilities

trimmer.py Trims silences at beginning and end of an audio file. Splits an audio file into multiple audio files by periods of silence.

fsdd.py A simple class that provides an easy to use API to access the data.

spectogramer.py Used for creating spectrograms of the audio data. Spectrograms are often a useful pre-processing step.

Usage

The test set officially consists of the first 10% of the recordings. Recordings numbered 0-4 (inclusive) are in the test and 5-49 are in the training set.

Made with FSDD

Did you use FSDD in a paper, project or app? Add it here!

https://github.com/Jakobovski/decoupled-multimodal-learning
https://adhishthite.github.io/sound-mnist/ by Adhish Thite (https://adhishthite.github.io/)
https://github.com/eonu/torch-fsdd - A simple PyTorch data loader for the dataset (by Edwin Onuonga).

External tools

Tensorflow https://www.tensorflow.org/datasets/catalog/spoken_digit
C#/.NET. The FSDD dataset can be used in .NET applications using the FreeSpokenDigitsDataset class included withing the Accord.NET Framework. A basic example on how to perform spoken digits classification using audio MFCC features can be found here.

License

Creative Commons Attribution-ShareAlike 4.0 International

Check this dataset in Dagshub

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

Free Spoken Digit Dataset (FSDD)

Current status

Organization

Metadata

Included utilities

Usage

Made with FSDD

External tools

License

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

DagsHub / audio-datasets connected to https://github.com/DAGsHub/audio-datasets.git

README.md

Free Spoken Digit Dataset (FSDD)

Current status

Organization

Metadata

Included utilities

Usage

Made with FSDD

External tools

License

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

DagsHub
/
audio-datasets
connected to https://github.com/DAGsHub/audio-datasets.git