Are you sure you want to delete this access key?
It is a database built from the voices of up to 6 actors who played 14 sentences simulating 6 emotional states (disgust, fear, anger, joy, surprise, sadness) plus the neutral state. These emotions are well-known found in most of the literature related to emotional speech. The recordings were made with professional equipment in the Fondazione Ugo Bordoni laboratories. The paper also describes a subjective validation test of the corpus, based on emotion-discrimination of two sentences carried out by two different groups of 24 listeners. The test was successful because it yielded an overall recognition accuracy of 80%. It is observed that emotions less easy to recognize are joy and disgust, whereas the easiest to detect are anger, sadness, and the neutral state
The similar version of dataset is uploaded to DagsHub: EMOVO, enabling you to preview the dataset before downloading it.
A corpus of emotional voice is essential to build systems of emotion recognition and systems of text-to-speech synthesis with emotion rendering. The introduction of emotive voice in man-machine communication finds its most important application in building user interfaces for many services, in particular call centers.
The dataset is small (335 MB) and simple to navigate as it has only six folders based on speaker ID. We also have a Document folder, which contains a research paper, a complete description of dataset labels, and a promotional poster for the dataset. There are 6 speakers, each speaker has to speak 14 sentences simulating 6 emotional states and each audio track is approximately 3 seconds long. The dataset labels are based on speaker identification, emotions, and sentence ID.
<root directory>
|
.- README.md
|
.- Documents/
|
.- Dataset/
|
.- f1/
|
.- f2/
|
.- f3/
|
.- m1/
|
.- m2/
|
.- m3/
|
.- dis-m3-b1.wav
|
.- dis-m3-b2.wav
|
...
Audio Label Key:
[Emotion] -> [speakerID] -> [Sentence] = dis-m3-b1.wav
For more information about Audio label read
LA STRUTTURA DI EMOVO.pdf
in Document folder.
The listeners were organized in two groups of 12 persons each, hosted in two different labs. Subjects were asked to guess between 2 emotions. From each actor’s voice 2 nonsense sentences were proposed: this prevented semantic content from biasing the guess among 7 outcomes. Each subject listened to the signals from a male and a female actor for a total of 84 tests. The results of the two labs were statistically compatible. Hence, one table only is presented in one table (see Table I)
In the study of speech signal the emotional aspect plays an increasing role because only through emotional connotations become artificial voices more natural and because the emotion recognition is an important parameter in many applications. In order to characterize the emotional voices it is essential to have a reference corpus. Thus, the realization of EMOVO has made available to the scientific community the first corpus of emotional voice for Italian.
I would like to thank Giovanni Costantini, Iacopo Iadarola , Andrea Paoloni and Massimiliano Todisco for creating the dataset and making it available for public.
The dataset is under open-source license and can be used for research purpose, for more information contact the author: GIOVANNI COSTANTINI -> costantini@uniroma2.it
Original Dataset: EMOVO-ITA
DAGsHub Dataset: kingabzpro/EMOVO
Photo by Tom Ramalho on Unsplash
This open source contribution is part of DagsHub x Hacktoberfest
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?