HiTZ
/

pyannote-segmentation-3.0-RTVE

Automatic Speech Recognition

speaker-diarization

Model card Files Files and versions Community

chsougan commited on Dec 12, 2024

Commit

414b994

·

verified ·

1 Parent(s): dcd0721

Update README.md

Files changed (1) hide show

README.md +98 -3

README.md CHANGED Viewed

@@ -1,3 +1,98 @@
----
-license: cc-by-4.0
----

+---
+license: cc-by-4.0
+language:
+- es
+base_model:
+- pyannote/segmentation-3.0
+library_name: pyannote-audio
+---
+# pyannote-segmentation-3.0-RTVE
+## Model Details
+This model is a fine-tuned version of [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) on [the RTVE database](https://catedrartve.unizar.es/rtvedatabase.html) used for Albayzin Evaluations of IberSPEECH 2024.
+On the RTVE2024 test set it achives the following results (two-decimal rounding):
+- Diarization Error Rate (DER): 15.19%
+- False Alarm: 2.74%
+- Missed Detection: 4.55%
+- Speaker Confusion: 7.90%
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+This fine-tuned segmentation model is intented to be used for speaker diarization of TV shows.
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The [train.lst](https://huggingface.co/chsougan/pyannote-segmentation-3.0-RTVE/blob/main/train.lst) file includes the URIs of the training data.
+#### Training Hyperparameters
+**Model:**  <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+  - duration: 10.0
+  - max_speakers_per_chunk: 3
+  - max_speakers_per_frame: 2
+  - train_batch_size: 32
+  - powerset_max_classes: 2
+**Adam Optimizer:**
+  - lr: 0.0001
+**Early Stopping:**
+  - monitor: 'DiarizationErrorRate'
+  - direction: 'min'
+  - max_epochs: 20
+### Development Data
+The [development.lst](https://huggingface.co/chsougan/pyannote-segmentation-3.0-RTVE/blob/main/development.lst) file includes the URIs of the development data.
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+- Forgiveness collar: 250ms
+- Skip overlap: False
+### Testing Data & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+The [test.lst](https://huggingface.co/chsougan/pyannote-segmentation-3.0-RTVE/blob/main/test.lst) file includes the URIs of the testing data.
+#### Metrics
+Diarization Error Rate, False Alarm, Missed Detection, Speaker Confusion.
+## Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+If you use this model, please cite:
+**BibTeX:**
+```bibtex
+@inproceedings{souganidis24_iberspeech,
+  title     = {HiTZ-Aholab Speaker Diarization System for Albayzin Evaluations of IberSPEECH 2024},
+  author    = {Christoforos Souganidis and Gemma Meseguer and Asier Herranz and Inma {Hernáez Rioja} and Eva Navas and Ibon Saratxaga},
+  year      = {2024},
+  booktitle = {IberSPEECH 2024},
+  pages     = {327--330},
+  doi       = {10.21437/IberSPEECH.2024-68},
+}
+````