|
--- |
|
metrics: |
|
- wer |
|
- cer |
|
library_name: transformers |
|
pipeline_tag: automatic-speech-recognition |
|
tags: |
|
- Pomak |
|
- Slavic |
|
--- |
|
|
|
# wav2vec2-xls-r-slavic-pomak |
|
|
|
Pomak is an endangered South East Slavic language variety spoken in Nothern Greece. |
|
This is the first automatic speech recognition (ASR) model for Pomak. |
|
To train the model, we fine-tuned a Slavic model ([classla/wav2vec2-large-slavic-parlaspeech-hr](https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr)) on 11h of recorded Pomak speech. |
|
|
|
## Recordings |
|
|
|
Four native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece, resulting in a corpus of 14h. |
|
|
|
|Speaker|Gender|Total recorded hours| |
|
|---|---|---| |
|
|NK9dIF | F | 4h 44m 45s| |
|
|xoVY9q | M | 4h 36m 12s| |
|
|9G75fk | F | 1h 44m 03s| |
|
|n5WzHj | M | 3h 44m 04s| |
|
|
|
To fine-tune the model, we split the long recordings into smaller segments of a maximum of 25 seconds each. |
|
This removed the majority of pauses and resulted in a total dataset duration of 11h 8m. |
|
|
|
## Metrics |
|
|
|
We evaluated the model on the test set split, which consists of 10% of the dataset recordings. |
|
|
|
|Model|CER|WER| |
|
|---|---|---| |
|
|pre-trained|87.31%|31.47%| |
|
|fine-tuned|9.06%|3.12%| |
|
|
|
## Training hyperparameters |
|
|
|
We fine-tuned the baseline model (`wav2vec2-large-slavic-parlaspeech-hr`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters: |
|
|
|
| arg | value | |
|
|-------------------------------|-------| |
|
| `per_device_train_batch_size` | 8 | |
|
| `gradient_accumulation_steps` | 2 | |
|
| `num_train_epochs` | 35 | |
|
| `learning_rate` | 3e-4 | |
|
| `warmup_steps` | 500 | |
|
|
|
## Citation |
|
|
|
To cite this work or read more about the training pipeline, see [this paper](https://aclanthology.org/2023.fieldmatters-1.5/) |
|
|
|
``` |
|
@inproceedings{tsoukala-etal-2023-asr, |
|
title = "{ASR} pipeline for low-resourced languages: A case study on Pomak", |
|
author = "Tsoukala, Chara and |
|
Kritsis, Kosmas and |
|
Douros, Ioannis and |
|
Katsamanis, Athanasios and |
|
Kokkas, Nikolaos and |
|
Arampatzakis, Vasileios and |
|
Sevetlidis, Vasileios and |
|
Markantonatou, Stella and |
|
Pavlidis, George", |
|
booktitle = "Proceedings of the Second Workshop on NLP Applications to Field Linguistics", |
|
month = may, |
|
year = "2023", |
|
address = "Dubrovnik, Croatia", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2023.fieldmatters-1.5", |
|
doi = "10.18653/v1/2023.fieldmatters-1.5", |
|
pages = "40--45", |
|
abstract = "Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.", |
|
} |
|
``` |