This model whas trained with two A100 40 GB, 128 GB RAM and 2 x Xeon 48 Core 2.4 GHz

  • Time spent ~ 7 hours
  • Count of train dataset - 118k of audio samples from Mozilla Common Voice 17

Example of usage

from transformers import pipeline
import gradio as gr
import time

pipe = pipeline(
    model="dvislobokov/whisper-large-v3-turbo-russian",
    tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
    task='automatic-speech-recognition',
    device='cpu'
)

def transcribe(audio):
    start = time.time()
    text = pipe(audio, return_timestamps=True)['text']
    print(time.time() - start)
    return text

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
    outputs='text'
)

iface.launch(share=True)
Downloads last month
1,256
Safetensors
Model size
809M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for dvislobokov/whisper-large-v3-turbo-russian

Finetuned
(171)
this model

Dataset used to train dvislobokov/whisper-large-v3-turbo-russian

Space using dvislobokov/whisper-large-v3-turbo-russian 1