Are you sure you want to delete this access key?
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
If you use this data, please cite
as well as the original Flickr 8k text caption corpus:
You can download the original Flickr 8k corpus of text captions here: https://forms.illinois.edu/sec/1713398
This data is distributed under the Creative Commons Attribution-ShareAlike (CC BY-SA) license.
Here is a brief description of what is included in the Flickr 8k audio data:
The wavs/ directory contains 40,000 spoken audio captions in .wav audio format, one for each caption included in the train, dev, and test splits in the original Flickr 8k corpus (as defined by the files Flickr_8k.trainImages.txt, Flickr_8k.devImages.txt, and Flickr_8k.testImages.txt)
The audio is sampled at 16000 Hz with 16-bit depth, and stored in Microsoft WAVE audio format
The file wav2capt.txt contains a mapping from the .wav file names to the corresponding .jpg images as well as caption number. The .jpg file names and caption numbers can then be mapped to the caption text via the Flickr8k.token.txt file from the original Flickr 8k corpus.
The file wav2spk.txt contains a mapping from the .wav file names to its speaker. Each unique speaker is numbered consecutively from 1 to 183 (the total number of unique speakers).
Flickr Audio Corpus (4.2 GB): Download gzip'd tar file
MD5 checksum: 9d078f1f15
This open source contribution is part of DagsHub x Hacktoberfest
Press p or to see the previous file or, n or to see the next file
The Flickr 8k Audio Caption Corpus contains 40,000 spoken captions of 8,000 natural images. It was collected in 2015 to investigate multimodal learning schemes for unsupervised speech pattern discovery.
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?