Photo by Shubham Dhage on Unsplash

The Massively Multilingual Image Dataset (MMID) Dataset for Machine Learning

Install DagsHub:

pip install dagshub
Click on copy button to copy content

To stream this data directly on DagsHub

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/mmid-dataset")

fs.listdir("s3://mmid-pds")
Click on copy button to copy content

Description

MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, _and_ parallel to the word’s translation into English (and corresponding images.)

Additional information

Update frequency

Language data is added as it is ready for distribution.

License

See citation instructions at http://multilingual-images.org

Related datasets

BodyM Dataset

Cloud to Street – Microsoft Flood and Clouds Dataset

A2D2: Audi Autonomous Driving Dataset

Galaxy Evolution Explorer Satellite (GALEX)

Launch your ML development to new heights with DagsHub

Back to top