Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel


open-data-registry aws-pds sustainability agriculture earth observation geospatial life sciences + 724


disaster response classification image classification object detection autonomous vehicles machine translation vision + 490

 Open Source Data Science Datasets

Path: .

Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes.

dataset audio dvc git

Path: .

The FSDnoisy18k dataset is an open dataset containing 42.5 hours of audio across 20 sound event classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

dataset audio dvc git

Path: .

The FSL4 dataset contains ~4000 user-contributed loops uploaded to Freesound.

dataset audio dvc git

Path: .

WARBLRB10k is a collection of 10,000 smartphone audio recordings from around the UK, crowdsourced by users of Warblr the bird recognition app

dataset audio dvc git

Path: .

VehicleX is a large-scale synthetic dataset created in Unity that contains 1,362 vehicles of various 3D models with fully editable attributes.

dataset dvc git 3d model

Path: .

BuildingNet is a large-scale dataset of 3D building models whose exteriors are consistently labeled.

dataset dvc git 3d model

Path: .

The PanoContext dataset contains 500 annotated cuboid layouts of indoor environments such as bedrooms and living rooms.

dataset dvc git 3d model

Path: .

Thingi10K is a dataset of 3D-Printing Models. Specifically, there are 10,000 models from featured “things”, suitable for testing 3D printing techniques

dataset dvc git 3d model

Path: .

Sydney Urban Objects dataset contains a variety of common urban road objects scanned with a Velodyne HDL-64E LIDAR, collected in the CBD of Sydney, Australia. There are 631 individual scans of objects across classes of vehicles, pedestrians, signs and trees.

dataset 3d model

Path: .

The LEGOv2 database is a parameterized and annotated version of the CMU Let’s Go database from 2006 and 2007. This spoken dialogue corpus contains interactions captured from the CMU Let’s Go (LG) System by Carnegie Mellon University in 2006 and 2007. It is based on raw log-files from the LG system. The corpus has been parameterized and annotated by the Dialogue Systems Group at Ulm University, Germany.

dataset audio dvc git

Path: .

The ModelNet40 dataset contains synthetic object point clouds. As the most widely used benchmark for point cloud analysis, ModelNet40 is popular because of its various categories, clean shapes, well-constructed dataset, etc.

dataset dvc git 3d model

Path: .

ShapeNetSem is a smaller, more densely annotated subset of ShapeNetCore consisting of 12,000 models spread over a broader set of 270 categories.

dataset dvc git 3d model

Path: .

FreiHAND is a 3D hand pose dataset that records different hand actions performed by 32 people.

dataset dvc git 3d model

Path: .

The 3D Poses in the Wild dataset is the first dataset in the wild with accurate 3D poses for evaluation.

dataset 3d human pose estimation dvc git 3d model

nirbarazida / HUMAN4D

Updated 1 year ago

Path: .


dataset 3d human pose estimation dvc git 3d model

Path: .

The CHiME-Home dataset is a collection of annotated domestic environment audio recordings.

dataset audio dvc git

kingabzpro / EmoSynth

Updated 2 years ago

Path: .

EmoSynth is a dataset of 144 audio files which have been labelled by 40 listeners for the perceived emotion, in regard to the dimensions of Valence and Arousal.

dataset audio

mert.bozkirr / CREMA-D

Updated 2 years ago

Path: .

Crowd-sourced Emotional Multimodal Actors Dataset

dataset audio dvc

Path: .

Detecting bird sounds in audio is an important task for automatic wildlife monitoring, as well as in citizen science and audio library management.

dataset audio