Update README.md
Browse files
README.md
CHANGED
@@ -15,5 +15,35 @@ To train a Pomak ASR model, we fine-tuned a Slavic model ([classla/wav2vec2-larg
|
|
15 |
|
16 |
## Recordings
|
17 |
|
18 |
-
Fours native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece.
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
## Recordings
|
17 |
|
18 |
+
Fours native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece, resulting in a total of 14h.
|
19 |
|
20 |
+
|Speaker|Gender|Total recorded hours|
|
21 |
+
|---|---|---|
|
22 |
+
|NK9dIF | F | 4h 44m 45s|
|
23 |
+
|xoVY9q | M | 4h 36m 12s|
|
24 |
+
|9G75fk | F | 1h 44m 03s|
|
25 |
+
|n5WzHj | M | 3h 44m 04s|
|
26 |
+
|
27 |
+
To fine-tune the model, we split the long recordings into smaller segments of a maximum of 25 seconds each.
|
28 |
+
This removed the majority of pauses and resulted in a total dataset duration of 11h 8m.
|
29 |
+
|
30 |
+
## Metrics
|
31 |
+
|
32 |
+
The test set consists of 10% of the dataset recordings.
|
33 |
+
|
34 |
+
|Model|CER|WER|
|
35 |
+
|---|---|---|
|
36 |
+
|pre-trained|87.31%|31.47%|
|
37 |
+
|fine-tuned|9.06%|3.12%|
|
38 |
+
|
39 |
+
## Training hyperparameters
|
40 |
+
|
41 |
+
To fine-tune the wav2vec2-large-slavic-parlaspeech-hr model, we used the following hyperparameters:
|
42 |
+
|
43 |
+
| arg | value |
|
44 |
+
|-------------------------------|-------|
|
45 |
+
| `per_device_train_batch_size` | 8 |
|
46 |
+
| `gradient_accumulation_steps` | 2 |
|
47 |
+
| `num_train_epochs` | 35 |
|
48 |
+
| `learning_rate` | 3e-4 |
|
49 |
+
| `warmup_steps` | 500 |
|