Create README.md

# wav2vec2-xls-r-slavic-pomak

To train a Pomak ASR model, we fine-tuned a Slavic model ([classla/wav2vec2-large-slavic-parlaspeech-hr](https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr)) on 11h of recorded Pomak speech.

Specifically, 4 speakers of Pomak (2 male and 2 female) agreed to record 14h of read Pomak speech at the ILSP audio-visual studio in Xanthi, Greece.

|Speaker|Gender|Total recorded hours|
|---|---|---|
|NK9dIF | F | 4h 44m 45s|
|xoVY9q | M | 4h 36m 12s|
|v9G75fk | F | 1h 44m 03s|
|n5WzHj | M | 3h 44m 04s|

To train the model, we segmented the audio files into parts of 25 seconds maximum. This reduced the total duration of the recordings, resulting in a dataset of 11h 8m in total.

## Metrics

The evaluation was performed on 10% of the total recordings.

|model|CER|WER|
|---|---|---|
|pre-trained|87.31 % |31.47 %|
|fine-tuned|9.06 % | 3.12 %|

## Training hyperparameters

To fine-tune the [classla/wav2vec2-large-slavic-parlaspeech-hr](https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr) model, we used following arguments:

| arg | value |
|-------------------------------|-------|
| `per_device_train_batch_size` | 8 |
| `gradient_accumulation_steps` | 2 |
| `num_train_epochs` | 35 |
| `learning_rate` | 3e-4 |
| `warmup_steps` | 500 |

Files changed (1) hide show

README.md +10 -0

README.md ADDED Viewed

	@@ -0,0 +1,10 @@

+---
+metrics:
+- wer
+- cer
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
+tags:
+- Pomak
+- Slavic
+---