L-theorist/zerospeech2021_dataset

1 Branches

.dvc

cc686192b6

updated config

3 years ago

lexical

phonetic

semantic

syntactic

.dvcignore

6bf1b8f330

added dataset to dvc tracking

3 years ago

.gitignore

6bf1b8f330

added dataset to dvc tracking

3 years ago

README.md

65cc4b6b83

updated README

3 years ago

lexical.dvc

6bf1b8f330

added dataset to dvc tracking

3 years ago

phonetic.dvc

6bf1b8f330

added dataset to dvc tracking

3 years ago

semantic.dvc

6bf1b8f330

added dataset to dvc tracking

3 years ago

syntactic.dvc

6bf1b8f330

added dataset to dvc tracking

3 years ago

DagsHub Storage

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

You have to be logged in to leave a comment.

ZeroSpeech Challenge 2021 Datasetls

General Information

Traditional speech and language technologies are trained with massive amounts of text and/or expert knowledge. This is not sustainable: the majority of the world’s languages do not have reliable textual or expert resources. Even in high resourced languages, there is a large domain mismatch between oral and written uses of language.

But infants learn to speak their native language, spontaneously, from raw sensory input, without supervision from text or linguists. It should be possible to do the same in machines!

The ultimate goal of the “Zero Resource Speech Challenge” 1 is to construct a system that learn an end-to-end Spoken Dialog (SD) system, in an unknown language, from scratch, using only raw sensory information available to an early language learner.

For a more detailed account and the competition timeline see here. For an overview of previous competitions see the challenge website.

Structure


├── lexical
│   ├── dev
├──────── aAAfmkmQpVz.wav
│   ├──── ...
│   └── test
├──────── aaaDSGZhrtbq.wav
│   ├──── ...
├── phonetic
│   ├── dev-clean
├──────── 84-121123-0000.wav
│   ├──── ...
│   ├── dev-other
├──────── 116-288045-0000.wav
│   ├──── ...
│   ├── test-clean
├──────── 61-70968-0000.wav
│   ├──── ...
│   └── test-other
├──────── 367-130732-0000.wav
│   ├──── ...
├── semantic
│   ├── dev
│   │   ├── librispeech
├──────────── aaRpeKDRbj.wav
│   ├──────── ...
│   │   └── synthetic
├──────────── aAbcsWWKCz.wav
│   ├──────── ...
│   │   └── gold.csv
│   │   └── pairs.csv
│   └── test
│       ├── librispeech
├──────────── AabWUdQiJx.wav
│   ├──────── ...
│       └── synthetic
├──────────── aaEGIphSpE.wav
│   ├──────── ...
└── syntactic
│   ├── dev
├──────────── aaacEBDmoCU.wav
│   ├──────── ...
│   └── test
├──────────── aAAAZvtMsGyf.wav
│   ├──────── ...

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors and Credits

Challenge Organizing Committee https://zerospeech.com/2021/#challenge-organizing-committee

phonetic/*.wav semantic/{dev,test}/librispeech

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

ZeroSpeech Challenge 2021 Datasetls

General Information

Structure

License

Authors and Credits

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

L-theorist / zerospeech2021_dataset

README.md

ZeroSpeech Challenge 2021 Datasetls

General Information

Structure

License

Authors and Credits

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

L-theorist
/
zerospeech2021_dataset