Datasets

Welcome to our Datasets database, where you’ll find hundreds of datasets from various categories such as computer vision, audio, NLP, and more. All datasets are free and ready for use on the DagsHub platform for all your projects. Browse through our categories and find the perfect dataset to fit your needs. Get started today and experience the power of data.

COCO 1K

LAION-Aesthetics V2 (6.5+)

NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17 & 18

Sentinel-2 Cloud-Optimized GeoTIFFs

Radiant MLHub

Image classification – fast.ai datasets

OpenCell on AWS

ESA WorldCover

SiPeCaM (Sitios Permanentes de la Calibración y Monitoreo de la Biodiversidad)

Allen Brain Observatory – Visual Coding AWS Public Data Set

Prefeitura Municipal de São Paulo (PMSP) LiDAR Point Cloud

Click for more

Automatic Speech Recognition (ASR) Error Robustness

Helpful Sentences from Reviews

Learning to Rank and Filter – community question answering

AI2 TabMCQ: Multiple Choice Questions aligned with the Aristo Tablestore

The Klarna Product-Page Dataset

MultiCoNER Dataset

Low Context Name Entity Recognition (NER) Datasets with Gazetteer

Common Screens

WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation

Sudachi Language Resources

Japanese Tokenizer Dictionaries

Click for more

Covid Job Impacts – US Hiring Data Since March 1 2020

NASA Prediction of Worldwide Energy Resources (POWER)

U.S. Census ACS PUMS

Japanese Tokenizer Dictionaries

recount3

Mars Spectrometry: Detect Evidence for Past Habitability

Legal Entity Identifier (LEI) and Legal Entity Reference Data (LE-RD)

TSBench

Speedtest by Ookla Global Fixed and Mobile Network Performance Maps

CAM6 Data Assimilation Research Testbed (DART) Reanalysis: Cloud-Optimized Dataset

Mars Spectrometry 2: Gas Chromatography for the Sample Analysis at Mars Data (SAM) Instrument

Click for more

NOAA National Water Model Short-Range Forecast

Prefeitura Municipal de São Paulo (PMSP) LiDAR Point Cloud

NREL National Solar Radiation Database

The Klarna Product-Page Dataset

DigitalCorpora

Common Screens

NASA Prediction of Worldwide Energy Resources (POWER)

Geosnap Data, Center for Geospatial Sciences

Multiview Extended Video with Activities (MEVA)

Normalized Difference Urban Index (NDUI)

ComStock

Click for more

Transform your ML development with DagsHub –
Try it now!

Binding DB – Data Lakehouse Ready

IBL Behavioral Data on AWS

1000 Genomes

OpenCell on AWS

1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5 – Data Lakehouse Ready

Encyclopedia of DNA Elements (ENCODE)

Variant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) Plugin

Allen Ivy Glioblastoma Atlas

SiPeCaM (Sitios Permanentes de la Calibración y Monitoreo de la Biodiversidad)

Tabula Sapiens

COVID-19 Genome Sequence Dataset

Click for more

NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17 & 18

Radiant MLHub

CMIP6 GCMs downscaled using WRF

Sentinel-2 Cloud-Optimized GeoTIFFs

Community Earth System Model Large Ensemble (CESM LENS)

NOAA Real-Time Mesoscale Analysis (RTMA)

ESA WorldCover

NOAA National Water Model Short-Range Forecast

Virginia Coastal Resilience Master Plan, Phase 1 – December 2021

HIRLAM Weather Model

Defense Meteorology Satellite Program (DMSP) Auroral Particle Flux

Click for more

voice gender detection

lego-spoken-dialogue-corpus

free-spoken-digit-dataset

daps-dataset

children-song-dataset

emo-db

basic-arabic-vocal-emotions-dataset

esc50-dataset

URDU-Dataset

Public Domain Sounds

VIVAE

Click for more

Launch your ML development to new heights with DagsHub

Back to top