
Install DagsHub:
pip install dagshub
To stream this data directly on DagsHub
from dagshub.streaming import DagsHubFilesystem
fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/mmid-dataset")
fs.listdir("s3://mmid-pds")
Description
MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, _and_ parallel to the word’s translation into English (and corresponding images.)
Additional information
Documentation
Update frequency
Language data is added as it is ready for distribution.
Managed by
License
See citation instructions at http://multilingual-images.org