Streaming 40+ Open Source Audio Datasets for free with 3 lines of code
In this short article, I'll share with you an awesome resource for finding open source datasets, and an easy way to stream them for any use.
If you didn't know, streaming means you don't need to wait for the download to finish, it happens while you train your model. That way, everything happens faster and you save money on cloud computing :)
Step 1: Pick the dataset of your liking from DagsHub Explore Datasets page.
Step 2: Simple setup
pip3 install dagshub
dagshub login # you'll be prompted with instructions to authenticate
Step 3: Stream the dataset using the following snippet:
from dagshub.streaming import DagsHubFilesystem
fs = DagsHubFilesystem(repo_url="<url-of-your-chosen-dataset>",
project_root=".")
for f in fs.listdir("audio-train"): # <- a folder inside the repository
print(f) # <- You can use fs.open(f) to access the content of the file
# Do data science idk
Congrats! You just unlocked a whole new world of datasets and data science-ing.
For additional information, read the docs.