Create README.md
Browse files# wav2vec2-xls-r-slavic-pomak
To train a Pomak ASR model, we fine-tuned a Slavic model ([classla/wav2vec2-large-slavic-parlaspeech-hr](https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr)) on 11h of recorded Pomak speech.
Specifically, 4 speakers of Pomak (2 male and 2 female) agreed to record 14h of read Pomak speech at the ILSP audio-visual studio in Xanthi, Greece.
|Speaker|Gender|Total recorded hours|
|---|---|---|
|NK9dIF | F | 4h 44m 45s|
|xoVY9q | M | 4h 36m 12s|
|v9G75fk | F | 1h 44m 03s|
|n5WzHj | M | 3h 44m 04s|
To train the model, we segmented the audio files into parts of 25 seconds maximum. This reduced the total duration of the recordings, resulting in a dataset of 11h 8m in total.
## Metrics
The evaluation was performed on 10% of the total recordings.
|model|CER|WER|
|---|---|---|
|pre-trained|87.31 % |31.47 %|
|fine-tuned|9.06 % | 3.12 %|
## Training hyperparameters
To fine-tune the [classla/wav2vec2-large-slavic-parlaspeech-hr](https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr) model, we used following arguments:
| arg | value |
|-------------------------------|-------|
| `per_device_train_batch_size` | 8 |
| `gradient_accumulation_steps` | 2 |
| `num_train_epochs` | 35 |
| `learning_rate` | 3e-4 |
| `warmup_steps` | 500 |