Are you sure you want to delete this access key?
Image credit: Allef Vinicius on Unsplash
URDU dataset contains emotional utterances of Urdu speech gathered from Urdu talk shows. It contains 400 utterances of four basic emotions: Angry, Happy, Neutral, and Emotion. There are 38 speakers (27 male and 11 female). This data is created from YouTube. Speakers are selected randomly.
The similar version of dataset is uploaded to DagsHub, enabling you to preview the dataset before downloading it.
Nomenclature followed while naming the files in the dataset is to provide information about the speaker, gender, number of the file for that speaker and overall numbering of the file in particular emotion. Files are named as follows:
General Name: SGX_FXX_EYY
For more details about dataset, please refer the complete paper "Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages". https://arxiv.org/pdf/1812.10411.pdf
.wav
format (pushed to DagsHub).wav
format (pushed to DagsHub).wav
format (pushed to DagsHub).wav
format (pushed to DagsHub)The dataset is small (88MB) and simple to navigate as it have 4 folders based on emotions. Each folder contains 100 .wav
audio files containing emotions of Urdu speakers. The audio file range from 2~3 second of audio taken from various video uploaded on YouTube. The following ASCII diagram depicts the directory structure.
<root directory>
|
.- README.md
|
.- Angry/
|
.- Happy/
|
.- Neutral/
|
.- Sad/
|
.- SF10_F1_S01.wav
|
.- SF10_F2_S02.wav
| ...
The name of audio file can be divided into three segments which is segregated by underscore(_) sign.
Where,
In SGX, G indicates the gender of the speaker either it can be M for male speaker and F for female speaker, while X represents the speaker ID which remains the same for a particular speaker in all the emotions.
In FXX, F is a keyword presenting file and XX indicates the number of file for particular speaker.
In EYY, E provides the information about emotion i.e., A,H,N and S for Angery, Happy, Neutral and Sad. respectively.
For example, file name SM1_F01_A12 indicates that this is 1st file recorded by speaker No. 1 and A12 indicates that this is 12th file of Angery emotion.
Cross-lingual speech emotion recognition is an important task for practical applications. The performance of automatic speech emotion recognition systems degrades in cross corpus scenarios, particularly in scenarios involving multiple languages or a previously unseen language such as Urdu for which limited or no data is available.
When data of multiple languages are used for training, results for emotion detection is increased even for URDU dataset, which is highly dissimilar from other databases. Also, accuracy boosted when a small fraction of testing data is included in the training of the model with single corpus. These findings would be very helpful for designing a robust emotion recognition systems even for the languages having limited or no dataset. Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages
First and foremost, I would like to thank Siddique Latif and his team from Information Technology University (ITU)-Punjab and COMSATS University Islamabad (CUI), Islamabad for pushing the audio dataset to GitHub.
We would like to thank Farwa Anees, Muhammad Usman, Muhammad Atif, and Farid Ullah Khan for assisting us in preparation of URDU dataset.
Siddique Latif, Jun 29, 2018
Original Dataset: Urdu-dataset
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?