Streaming 40+ Open Source Audio Datasets for free with 3 lines of code
  Back to blog home

Streaming 40+ Open Source Audio Datasets for free with 3 lines of code

Direct Data Access Nov 14, 2022

In this short article, I'll share with you an awesome resource for finding open source datasets, and an easy way to stream them for any use.

If you didn't know, streaming means you don't need to wait for the download to finish, it happens while you train your model. That way, everything happens faster and you save money on cloud computing :)

Step 1: Pick the dataset of your liking from DagsHub Explore Datasets page.

Step 2: Simple setup

pip3 install dagshub
dagshub login # you'll be prompted with instructions to authenticate

Step 3: Stream the dataset using the following snippet:

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(repo_url="<url-of-your-chosen-dataset>",
					   project_root=".")

for f in fs.listdir("audio-train"): # <- a folder inside the repository
	print(f) # <- You can use fs.open(f) to access the content of the file
	# Do data science idk

Congrats! You just unlocked a whole new world of datasets and data science-ing.

For additional information, read the docs.

Tags

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.