metadata
license: cc-by-4.0
language:
- es
base_model:
- pyannote/segmentation-3.0
library_name: pyannote-audio
pyannote-segmentation-3.0-RTVE
Model Details
This model is a fine-tuned version of pyannote/segmentation-3.0 on the RTVE database used for Albayzin Evaluations of IberSPEECH 2024.
On the RTVE2024 test set it achives the following results (two-decimal rounding):
- Diarization Error Rate (DER): 15.19%
- False Alarm: 2.74%
- Missed Detection: 4.55%
- Speaker Confusion: 7.90%
Uses
This fine-tuned segmentation model is intented to be used for speaker diarization of TV shows.
Training Details
Training Data
The train.lst file includes the URIs of the training data.
Training Hyperparameters
Model:
- duration: 10.0
- max_speakers_per_chunk: 3
- max_speakers_per_frame: 2
- train_batch_size: 32
- powerset_max_classes: 2
Adam Optimizer:
- lr: 0.0001
Early Stopping:
- monitor: 'DiarizationErrorRate'
- direction: 'min'
- max_epochs: 20
Development Data
The development.lst file includes the URIs of the development data.
Evaluation
- Forgiveness collar: 250ms
- Skip overlap: False
Testing Data & Metrics
Testing Data
The test.lst file includes the URIs of the testing data.
Metrics
Diarization Error Rate, False Alarm, Missed Detection, Speaker Confusion.
Citation
If you use this model, please cite:
BibTeX:
@inproceedings{souganidis24_iberspeech,
title = {HiTZ-Aholab Speaker Diarization System for Albayzin Evaluations of IberSPEECH 2024},
author = {Christoforos Souganidis and Gemma Meseguer and Asier Herranz and Inma {Hernáez Rioja} and Eva Navas and Ibon Saratxaga},
year = {2024},
booktitle = {IberSPEECH 2024},
pages = {327--330},
doi = {10.21437/IberSPEECH.2024-68},
}