ilsp
/

wav2vec2-xls-r-slavic-pomak

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

ctsoukala commited on Mar 28, 2023

Commit

aa6de74

·

1 Parent(s): a092095

Update README.md

Files changed (1) hide show

README.md +31 -1

README.md CHANGED Viewed

@@ -15,5 +15,35 @@ To train a Pomak ASR model, we fine-tuned a Slavic model ([classla/wav2vec2-larg
 ## Recordings
-Fours native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece.

 ## Recordings
+Fours native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece, resulting in a total of 14h.
+|Speaker|Gender|Total recorded hours|
+|---|---|---|
+|NK9dIF | F | 4h 44m 45s|
+|xoVY9q | M | 4h 36m 12s|
+|9G75fk | F | 1h 44m 03s|
+|n5WzHj | M | 3h 44m 04s|
+To fine-tune the model, we split the long recordings into smaller segments of a maximum of 25 seconds each.
+This removed the majority of pauses and resulted in a total dataset duration of 11h 8m.
+## Metrics
+The test set consists of 10% of the dataset recordings.
+|Model|CER|WER|
+|---|---|---|
+|pre-trained|87.31%|31.47%|
+|fine-tuned|9.06%|3.12%|
+## Training hyperparameters
+To fine-tune the wav2vec2-large-slavic-parlaspeech-hr model, we used the following hyperparameters:
+| arg                           | value |
+|-------------------------------|-------|
+| `per_device_train_batch_size` | 8     |
+| `gradient_accumulation_steps` | 2     |
+| `num_train_epochs`            | 35    |
+| `learning_rate`               | 3e-4  |
+| `warmup_steps`                | 500   |