Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
9a28355371
Initial commit
1 year ago
665076c451
update readme automation
1 year ago
Storage Buckets

README.md

You have to be logged in to leave a comment. Sign In

The Massively Multilingual Image Dataset (MMID)

Stream data with DDA:

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/mmid-dataset")

fs.listdir("s3://mmid-pds")

Description:

MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, and parallel to the word's translation into English (and corresponding images.)

Contact:

MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, and parallel to the word's translation into English (and corresponding images.)

Update Frequency:

Language data is added as it is ready for distribution.

Managed By:

https://github.com/penn-nlp

Resources:

  1. resource:
    • Description: Images for words in various languages, packaged by in .tar archives by each language.

    • ARN: arn:aws:s3:::mmid-pds

    • Region: us-east-1

    • Type: S3 Bucket

Tags:

aws-pds, computer vision, machine learning, machine translation, natural language processing

Tip!

Press p or to see the previous file or, n or to see the next file

About

mmid-dataset is originate from the Registry of Open Data on AWS

Collaborators 5

Comments

Loading...