Back to blog home

Streaming 40+ Open Source Audio Datasets for free with 3 lines of code

Take control of your multimodal data

Curate and annotate datasets, track experiments, and manage models on a single platform.

Get started
Table of Contents
    Share This Article

    In this short article, I'll share with you an awesome resource for finding open source datasets, and an easy way to stream them for any use.

    If you didn't know, streaming means you don't need to wait for the download to finish, it happens while you train your model. That way, everything happens faster and you save money on cloud computing :)

    Step 1: Pick the dataset of your liking from DagsHub Explore Datasets page.

    Step 2: Simple setup

    pip3 install dagshub
    dagshub login # you'll be prompted with instructions to authenticate

    Step 3: Stream the dataset using the following snippet:

    from dagshub.streaming import DagsHubFilesystem
    
    fs = DagsHubFilesystem(repo_url="<url-of-your-chosen-dataset>",
    					   project_root=".")
    
    for f in fs.listdir("audio-train"): # <- a folder inside the repository
    	print(f) # <- You can use fs.open(f) to access the content of the file
    	# Do data science idk

    Congrats! You just unlocked a whole new world of datasets and data science-ing.

    For additional information, read the docs.