Whisper-large-v3-no-numbers

Model info

This is a version of openai/whisper-large-v3 model without number tokens (token ids corresponding to numbers are excluded). NO fine-tuning was used.

Phrases with spoken numbers will be transcribed with numbers as words. It can be useful for TTS data preparation.

Example: Instead of "25" this model will transcribe phrase as "twenty five".

Usage

transformers version 4.45.2

Model can be used as an original whisper:

>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
>>> import torchaudio

>>> # load audio
>>> wav, sr = torchaudio.load("audio.wav")
>>> # resample if necessary
>>> wav = torchaudio.functional.resample(wav, sr, 16000)

>>> # load model and processor
>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")

>>> input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt").input_features 

>>> # generate token ids
>>> predicted_ids = model.generate(input_features)
>>> # decode token ids to text
>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
['<|startoftranscript|><|en|><|transcribe|><|notimestamps|> Twenty seven years. <|endoftext|>']

The context tokens can be removed from the start of the transcription by setting skip_special_tokens=True.

Downloads last month
694
Safetensors
Model size
1.54B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for waveletdeboshir/whisper-large-v3-no-numbers

Finetuned
(386)
this model

Collection including waveletdeboshir/whisper-large-v3-no-numbers