Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

General

open-data-registry aws-pds sustainability agriculture earth observation geospatial life sciences + 712

Task

disaster response classification image classification object detection autonomous vehicles machine translation vision + 490

 Open Source Data Science Datasets

DagsHub / WIkiText-103

Updated 1 year ago

Path: .

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

dataset nlp language modelling dvc git

Path: .

The purpose of the project is to make available a standard training and test setup for language modeling experiments.

dataset nlp language modelling dvc git

DagsHub / LAMBADA

Updated 1 year ago

Path: .

This archive contains the LAMBADA dataset (LAnguage Modeling Broadened to Account for Discourse Aspects)

dataset nlp language modelling dvc git