Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

General

open-data-registry aws-pds sustainability agriculture earth observation geospatial life sciences + 726

Task

disaster response classification image classification object detection autonomous vehicles machine translation vision + 490

 Open Source Data Science Datasets

Path: .

Dataset Registry for the project to train a model to segment Baby Yoda from the series "The Mandalorian". Showcases the use of DVC imports

dataset computer vision semantic segmentation dvc label studio git aws s3

Path: .

A subset of the LAION Aesthetics V2 dataset that contains only images with an aesthetics score of 6.5 or larger.

dataset nlp computer vision text-to-image generation dvc git

Path: .

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.

dataset nlp dvc git

Path: Mua tank nhua 1000 lit Ha Noi o dau chat luong? Dinh Hai Plastic la dia chi so 1

Tank nhựa 1000 lít Hà Nội là dòng tank chứa có dung tích lớn, với nhiều ưu điểm vượt trội, được rất nhiều khách hàng quan tâm hiện nay.

dataset dvc git

Path: .

Crowd-sourced Emotional Multimodal Actors Dataset

dataset audio dvc

Path: .

The CHiME-Home dataset is a collection of annotated domestic environment audio recordings.

dataset audio dvc git

Path: .

WARBLRB10k is a collection of 10,000 smartphone audio recordings from around the UK, crowdsourced by users of Warblr the bird recognition app

dataset audio dvc git

Path: .

The FSL4 dataset contains ~4000 user-contributed loops uploaded to Freesound.

dataset audio dvc git

Path: .

The FSDnoisy18k dataset is an open dataset containing 42.5 hours of audio across 20 sound event classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

dataset audio dvc git

Path: .

Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes.

dataset audio dvc git

Path: data/raw/card_images

A Python ML boilerplate based on Cookiecutter Data Science, providing support for data versioning (DVC), experiment tracking, Model&Dataset cards, etc.

dataset dvc label studio git mlflow

DagsHub / triviaqa

Updated 1 year ago

Path: .

Code for the TriviaQA reading comprehension dataset

dataset nlp dvc git github