Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Type:  dataset model Task:  transfer learning Data Domain:  audio Framework:  pytorch
2 years ago
a61ebcbc62
eval and markdown
2 years ago
6cf7481711
description
2 years ago
31a2efa532
added app
2 years ago
31a2efa532
added app
2 years ago
73d6051127
first test
2 years ago
73d6051127
first test
2 years ago
73d6051127
first test
2 years ago
b9df3647ca
Initial commit
2 years ago
73d6051127
first test
2 years ago
7733516b46
app directory added to readme
2 years ago
a61ebcbc62
eval and markdown
2 years ago
de1a0f3d8c
Sota Results
2 years ago
de1a0f3d8c
Sota Results
2 years ago
25df62456c
pacakge added
2 years ago
de1a0f3d8c
Sota Results
2 years ago
485f26f4c6
edited requirments
2 years ago
28739914e3
eval edit
2 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In
title emoji colorFrom colorTo sdk sdk_version app_file pinned license
Urdu ASR SOTA 👨‍🎤 pink blue gradio 2.8.11 Gradio/app.py false apache-2.0

Urdu Automatic Speech Recognition State of the Art Solution

cover Automatic Speech Recognition using Facebook's wav2vec2-xls-r-300m model and mozilla-foundation common_voice_8_0 Urdu Dataset.

Model Finetunning

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset.

It achieves the following results on the evaluation set:

  • Loss: 0.9889
  • Wer: 0.5607
  • Cer: 0.2370

Quick Prediction

Install all dependecies using requirment.txt file and then run bellow command to predict the text:

import torch
from datasets import load_dataset, Audio
from transformers import pipeline
model = "Model"
data = load_dataset("Data", "ur", split="test", delimiter="\t")
def path_adjust(batch):
    batch["path"] = "Data/ur/clips/" + str(batch["path"])
    return batch
data = data.map(path_adjust)
sample_iter = iter(data.cast_column("path", Audio(sampling_rate=16_000)))
sample = next(sample_iter)

asr = pipeline("automatic-speech-recognition", model=model)
prediction = asr(
            sample["path"]["array"], chunk_length_s=5, stride_length_s=1)
prediction
# => {'text': 'اب یہ ونگین لمحاتانکھار دلمیں میںفوث کریلیا اجائ'}

Evaluation Commands

To evaluate on mozilla-foundation/common_voice_8_0 with split test, you can copy and past the command to the terminal.

python3 eval.py --model_id Model --dataset Data --config ur --split test --chunk_length_s 5.0 --stride_length_s 1.0 --log_outputs

OR Run the simple shell script

bash run_eval.sh

Language Model

Boosting Wav2Vec2 with n-grams in 🤗 Transformers

  • Get suitable Urdu text data for a language model
  • Build an n-gram with KenLM
  • Combine the n-gram with a fine-tuned Wav2Vec2 checkpoint

Install kenlm and pyctcdecode before running the notebook.

pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode

Eval Results

Without LM With LM
56.21 46.37

Directory Structure

<root directory>
    |
    .- README.md
    |
    .- Data/
    |
    .- Model/
    |
    .- Images/
    |
    .- Sample/
    |
    .- Gradio/
    |
    .- Eval Results/
          |
          .- With LM/
          |
          .- Without LM/
          | ...
    .- notebook.ipynb
    |
    .- run_eval.sh
    |
    .- eval.py

Gradio App

SOTA

  • Add Language Model
  • Webapp/API
  • [] Denoise Audio
  • [] Text Processing
  • [] Spelling Mistakes
  • Hyperparameters optimization
  • [] Training on 300 Epochs & 64 Batch Size
  • [] Improved Language Model
  • [] Contribute to Urdu ASR Audio Dataset

Robust Speech Recognition Challenge 2022

This project was the results of HuggingFace Robust Speech Recognition Challenge. I was one of the winner with four state of the art ASR model. Check out my SOTA checkpoints.

winner

References

Tip!

Press p or to see the previous file or, n or to see the next file

About

Automatic Speech Recognition using Facebook wav2vec2-xls-r-300m model and mozilla-foundation common_voice_8_0 Urdu Dataset

https://huggingface.co/kingabzpro/wav2vec2-large-xls-r-300m-Urdu
Collaborators 1

Comments

Abid Ali Awan

commented in commite2f5996ee7on branch master

2 years ago Outdated

I am open to contributions and suggestion. So, Keep them comming.

I got 17% WER and I am still far from SOTA.

Loading...