YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This is facebook/wav2vec2-large-960h-lv60-self enhanced with a Wikipedia language model.

The dataset used is wikipedia/20200501.en. All articles were used. It was cleaned of references and external links and all text inside of parantheses. It has 8092546 words.

The language model was built using KenLM. It is a 5-gram model where all singletons of 3-grams and bigger were pruned. It was built as:

kenlm/build/bin/lmplz -o 5 -S 120G --vocab_estimate 8092546 --text text.txt --arpa text.arpa --prune 0 0 1

Suggested usage:

from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="gxbag/wav2vec2-large-960h-lv60-self-with-wikipedia-lm")
output = pipe("/path/to/audio.wav", chunk_length_s=30, stride_length_s=(6, 3))
output

Note that in the current version of transformers (as of the release of this model), when using striding in the pipeline it will chop off the last portion of audio, in this case 3 seconds. Add 3 seconds of silence to the end as a workaround. This problem was fixed in the GitHub version of transformers.

Downloads last month
4
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.