Wav2Vec2-Base-960h + 4-gram
This model is identical to Facebook's Wav2Vec2-Large-960h-lv60-self, but is
augmented with an English 4-gram. The 4-gram.arpa.gz
of Librispeech's official ngrams is used.
Evaluation
This code snippet shows how to evaluate patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram on LibriSpeech's "clean" and "other" test data.
from datasets import load_dataset
from transformers import AutoModelForCTC, AutoProcessor
import torch
from jiwer import wer
model_id = "patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram"
librispeech_eval = load_dataset("librispeech_asr", "other", split="test")
model = AutoModelForCTC.from_pretrained(model_id).to("cuda")
processor = AutoProcessor.from_pretrained(model_id)
def map_to_pred(batch):
inputs = processor(batch["audio"]["array"], sampling_rate=16_000, return_tensors="pt")
inputs = {k: v.to("cuda") for k,v in inputs.items()}
with torch.no_grad():
logits = model(**inputs).logits
transcription = processor.batch_decode(logits.cpu().numpy()).text[0]
batch["transcription"] = transcription
return batch
result = librispeech_eval.map(map_to_pred, remove_columns=["audio"])
print(wer(result["text"], result["transcription"]))
Result (WER):
"clean" | "other" |
---|---|
1.84 | 3.71 |
- Downloads last month
- 27
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Dataset used to train patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram
Spaces using patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram 12
Evaluation results
- Test WER on LibriSpeech (clean)test set self-reported1.840
- Test WER on LibriSpeech (other)test set self-reported3.710