ASR (one language)
#54
by
Mylunaire
- opened
I have used the model to transcribe audio in English. It makes a lot of mistakes so I'm wondering if my usage is correct. I suspect that I am doing an English to English translation leading to rephrasing rather than a simple transcription.
Below is the relevant part of my code.
a = read(audio_file_path)
arr = numpy.array(a[1],dtype=float)
audio_inputs = processor(audios=arr, return_tensors="pt").to(device)
output_tokens = model.generate(**audio_inputs, tgt_lang="eng", generate_speech=False)
transcription_chunk = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
Thanks a lot!
Mylunaire
changed discussion title from
ASR in one language
to ASR (one language)