Update README.md
Browse files
README.md
CHANGED
@@ -11,11 +11,13 @@ tags:
|
|
11 |
|
12 |
# wav2vec2-xls-r-slavic-pomak
|
13 |
|
14 |
-
|
|
|
|
|
15 |
|
16 |
## Recordings
|
17 |
|
18 |
-
|
19 |
|
20 |
|Speaker|Gender|Total recorded hours|
|
21 |
|---|---|---|
|
@@ -29,7 +31,7 @@ This removed the majority of pauses and resulted in a total dataset duration of
|
|
29 |
|
30 |
## Metrics
|
31 |
|
32 |
-
|
33 |
|
34 |
|Model|CER|WER|
|
35 |
|---|---|---|
|
@@ -38,7 +40,7 @@ The test set consists of 10% of the dataset recordings.
|
|
38 |
|
39 |
## Training hyperparameters
|
40 |
|
41 |
-
|
42 |
|
43 |
| arg | value |
|
44 |
|-------------------------------|-------|
|
@@ -46,4 +48,32 @@ To fine-tune the wav2vec2-large-slavic-parlaspeech-hr model, we used the followi
|
|
46 |
| `gradient_accumulation_steps` | 2 |
|
47 |
| `num_train_epochs` | 35 |
|
48 |
| `learning_rate` | 3e-4 |
|
49 |
-
| `warmup_steps` | 500 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
# wav2vec2-xls-r-slavic-pomak
|
13 |
|
14 |
+
Pomak is an endangered South East Slavic language variety spoken in Nothern Greece.
|
15 |
+
This is the first automatic speech recognition (ASR) model for Pomak.
|
16 |
+
To train the model, we fine-tuned a Slavic model ([classla/wav2vec2-large-slavic-parlaspeech-hr](https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr)) on 11h of recorded Pomak speech.
|
17 |
|
18 |
## Recordings
|
19 |
|
20 |
+
Four native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece, resulting in a corpus of 14h.
|
21 |
|
22 |
|Speaker|Gender|Total recorded hours|
|
23 |
|---|---|---|
|
|
|
31 |
|
32 |
## Metrics
|
33 |
|
34 |
+
We evaluated the model on the test set split, which consists of 10% of the dataset recordings.
|
35 |
|
36 |
|Model|CER|WER|
|
37 |
|---|---|---|
|
|
|
40 |
|
41 |
## Training hyperparameters
|
42 |
|
43 |
+
We fine-tuned the baseline model (`wav2vec2-large-slavic-parlaspeech-hr`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters:
|
44 |
|
45 |
| arg | value |
|
46 |
|-------------------------------|-------|
|
|
|
48 |
| `gradient_accumulation_steps` | 2 |
|
49 |
| `num_train_epochs` | 35 |
|
50 |
| `learning_rate` | 3e-4 |
|
51 |
+
| `warmup_steps` | 500 |
|
52 |
+
|
53 |
+
## Citation
|
54 |
+
|
55 |
+
To cite this work or read more about the training pipeline, see [this paper](https://aclanthology.org/2023.fieldmatters-1.5/)
|
56 |
+
|
57 |
+
```
|
58 |
+
@inproceedings{tsoukala-etal-2023-asr,
|
59 |
+
title = "{ASR} pipeline for low-resourced languages: A case study on Pomak",
|
60 |
+
author = "Tsoukala, Chara and
|
61 |
+
Kritsis, Kosmas and
|
62 |
+
Douros, Ioannis and
|
63 |
+
Katsamanis, Athanasios and
|
64 |
+
Kokkas, Nikolaos and
|
65 |
+
Arampatzakis, Vasileios and
|
66 |
+
Sevetlidis, Vasileios and
|
67 |
+
Markantonatou, Stella and
|
68 |
+
Pavlidis, George",
|
69 |
+
booktitle = "Proceedings of the Second Workshop on NLP Applications to Field Linguistics",
|
70 |
+
month = may,
|
71 |
+
year = "2023",
|
72 |
+
address = "Dubrovnik, Croatia",
|
73 |
+
publisher = "Association for Computational Linguistics",
|
74 |
+
url = "https://aclanthology.org/2023.fieldmatters-1.5",
|
75 |
+
doi = "10.18653/v1/2023.fieldmatters-1.5",
|
76 |
+
pages = "40--45",
|
77 |
+
abstract = "Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.",
|
78 |
+
}
|
79 |
+
```
|