Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
..
788d796718
Update CommonVoice/README.md
2 years ago

README.md

You have to be logged in to leave a comment. Sign In

This dataset is uploaded to a DAGsHub repository

DAGShub README

Mozilla Common Voice

If you use the data in a published academic work we would appreciate if you cite the following article:

Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F. M. and Weber, G. (2020) "Common Voice: A Massively-Multilingual Speech Corpus". Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). pp. 4211—4215

This dataset is released under the MPL (Mozilla Public License) version 2.0.

Brief description of datasets

Here is a brief description of what is included in the Common Voice audio data:

An open-source, multi-language dataset of voices that anyone can use to train speech-enabled applications.

Each entry in the dataset consists of a unique MP3 and corresponding text file. Many of the 13,905 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help train the accuracy of speech recognition engines.

The dataset currently consists of 11,192 validated hours in 76 languages, but we’re always adding more voices and languages.

Data downloads

The official website for Mozilla Common Voice (you can download the uncompressed dataset and past/newer ones here)

The DAGsHub Repository (This repository is at Common-Voice-Corpus 7.0 version en_2637h_2021-07-21)


This open source contribution is part of DagsHub x Hacktoberfest

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...