Datasets » Computer Vision » The Massively Multilingual Image Dataset (MMID)

The Massively Multilingual Image Dataset (MMID) Dataset for Machine Learning

Install DagsHub:

pip install dagshub

Click on copy button to copy content

To stream this data directly on DagsHub

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/mmid-dataset")

fs.listdir("s3://mmid-pds")

Click on copy button to copy content

Description

MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, _and_ parallel to the word’s translation into English (and corresponding images.)

Explore this dataset on DagsHub

Additional information

Documentation

https://multilingual-images.org/doc.html

Update frequency

Language data is added as it is ready for distribution.

Managed by

https://github.com/penn-nlp

License

See citation instructions at http://multilingual-images.org

Explore this dataset on DagsHub

The Massively Multilingual Image Dataset (MMID) Dataset for Machine Learning

Install DagsHub:

To stream this data directly on DagsHub

Description

Additional information

Documentation

Update frequency

Managed by

License

Related datasets

BodyM Dataset

Cloud to Street – Microsoft Flood and Clouds Dataset

A2D2: Audi Autonomous Driving Dataset

Galaxy Evolution Explorer Satellite (GALEX)

Launch your ML development to new heights with DagsHub

Take control of your multimodal data

ML Newsletter

The Massively Multilingual Image Dataset (MMID) Dataset for Machine Learning

Install DagsHub:

To stream this data directly on DagsHub

Description

Additional information

Documentation

Update frequency

Managed by

License

Tags

Related datasets

BodyM Dataset

Cloud to Street – Microsoft Flood and Clouds Dataset

A2D2: Audi Autonomous Driving Dataset

Galaxy Evolution Explorer Satellite (GALEX)

Launch your ML development to new heights with DagsHub

Take control of your multimodal data

ML Newsletter