ilsp
/

wav2vec2-xls-r-slavic-pomak

@@ -11,11 +11,13 @@ tags:
 # wav2vec2-xls-r-slavic-pomak
-To train a Pomak ASR model, we fine-tuned a Slavic model ([classla/wav2vec2-large-slavic-parlaspeech-hr](https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr)) on 11h of recorded Pomak speech.
 ## Recordings
-Fours native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece, resulting in a total of 14h.
 |Speaker|Gender|Total recorded hours|
 |---|---|---|
@@ -29,7 +31,7 @@ This removed the majority of pauses and resulted in a total dataset duration of
 ## Metrics
-The test set consists of 10% of the dataset recordings.
 |Model|CER|WER|
 |---|---|---|
@@ -38,7 +40,7 @@ The test set consists of 10% of the dataset recordings.
 ## Training hyperparameters
-To fine-tune the wav2vec2-large-slavic-parlaspeech-hr model, we used the following hyperparameters:
 | arg                           | value |
 |-------------------------------|-------|
@@ -46,4 +48,32 @@ To fine-tune the wav2vec2-large-slavic-parlaspeech-hr model, we used the followi
 | `gradient_accumulation_steps` | 2     |
 | `num_train_epochs`            | 35    |
 | `learning_rate`               | 3e-4  |
-| `warmup_steps`                | 500   |

 # wav2vec2-xls-r-slavic-pomak
+Pomak is an endangered South East Slavic language variety spoken in Nothern Greece.
+This is the first automatic speech recognition (ASR) model for Pomak.
+To train the model, we fine-tuned a Slavic model ([classla/wav2vec2-large-slavic-parlaspeech-hr](https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr)) on 11h of recorded Pomak speech.
 ## Recordings
+Four native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece, resulting in a corpus of 14h.
 |Speaker|Gender|Total recorded hours|
 |---|---|---|
 ## Metrics
+We evaluated the model on the test set split, which consists of 10% of the dataset recordings.
 |Model|CER|WER|
 |---|---|---|
 ## Training hyperparameters
+We fine-tuned the baseline model (`wav2vec2-large-slavic-parlaspeech-hr`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters:
 | arg                           | value |
 |-------------------------------|-------|
 | `gradient_accumulation_steps` | 2     |
 | `num_train_epochs`            | 35    |
 | `learning_rate`               | 3e-4  |
+| `warmup_steps`                | 500   |
+## Citation
+To cite this work or read more about the training pipeline, see [this paper](https://aclanthology.org/2023.fieldmatters-1.5/)
+```
+@inproceedings{tsoukala-etal-2023-asr,
+    title = "{ASR} pipeline for low-resourced languages: A case study on Pomak",
+    author = "Tsoukala, Chara  and
+      Kritsis, Kosmas  and
+      Douros, Ioannis  and
+      Katsamanis, Athanasios  and
+      Kokkas, Nikolaos  and
+      Arampatzakis, Vasileios  and
+      Sevetlidis, Vasileios  and
+      Markantonatou, Stella  and
+      Pavlidis, George",
+    booktitle = "Proceedings of the Second Workshop on NLP Applications to Field Linguistics",
+    month = may,
+    year = "2023",
+    address = "Dubrovnik, Croatia",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2023.fieldmatters-1.5",
+    doi = "10.18653/v1/2023.fieldmatters-1.5",
+    pages = "40--45",
+    abstract = "Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.",
+}
+```