chsougan commited on
Commit
414b994
·
verified ·
1 Parent(s): dcd0721

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -3
README.md CHANGED
@@ -1,3 +1,98 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - es
5
+ base_model:
6
+ - pyannote/segmentation-3.0
7
+ library_name: pyannote-audio
8
+ ---
9
+ # pyannote-segmentation-3.0-RTVE
10
+
11
+ ## Model Details
12
+
13
+ This model is a fine-tuned version of [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) on [the RTVE database](https://catedrartve.unizar.es/rtvedatabase.html) used for Albayzin Evaluations of IberSPEECH 2024.
14
+
15
+ On the RTVE2024 test set it achives the following results (two-decimal rounding):
16
+
17
+ - Diarization Error Rate (DER): 15.19%
18
+ - False Alarm: 2.74%
19
+ - Missed Detection: 4.55%
20
+ - Speaker Confusion: 7.90%
21
+
22
+
23
+ ## Uses
24
+
25
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
26
+ This fine-tuned segmentation model is intented to be used for speaker diarization of TV shows.
27
+
28
+ ## Training Details
29
+
30
+ ### Training Data
31
+
32
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
33
+
34
+ The [train.lst](https://huggingface.co/chsougan/pyannote-segmentation-3.0-RTVE/blob/main/train.lst) file includes the URIs of the training data.
35
+
36
+
37
+
38
+ #### Training Hyperparameters
39
+
40
+ **Model:** <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
41
+
42
+ - duration: 10.0
43
+ - max_speakers_per_chunk: 3
44
+ - max_speakers_per_frame: 2
45
+ - train_batch_size: 32
46
+ - powerset_max_classes: 2
47
+
48
+ **Adam Optimizer:**
49
+ - lr: 0.0001
50
+
51
+ **Early Stopping:**
52
+
53
+ - monitor: 'DiarizationErrorRate'
54
+ - direction: 'min'
55
+ - max_epochs: 20
56
+
57
+ ### Development Data
58
+
59
+ The [development.lst](https://huggingface.co/chsougan/pyannote-segmentation-3.0-RTVE/blob/main/development.lst) file includes the URIs of the development data.
60
+
61
+ ## Evaluation
62
+
63
+ <!-- This section describes the evaluation protocols and provides the results. -->
64
+
65
+ - Forgiveness collar: 250ms
66
+ - Skip overlap: False
67
+
68
+ ### Testing Data & Metrics
69
+
70
+ #### Testing Data
71
+
72
+ <!-- This should link to a Dataset Card if possible. -->
73
+
74
+ The [test.lst](https://huggingface.co/chsougan/pyannote-segmentation-3.0-RTVE/blob/main/test.lst) file includes the URIs of the testing data.
75
+
76
+
77
+ #### Metrics
78
+
79
+ Diarization Error Rate, False Alarm, Missed Detection, Speaker Confusion.
80
+
81
+
82
+ ## Citation
83
+
84
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
85
+ If you use this model, please cite:
86
+
87
+
88
+ **BibTeX:**
89
+ ```bibtex
90
+ @inproceedings{souganidis24_iberspeech,
91
+ title = {HiTZ-Aholab Speaker Diarization System for Albayzin Evaluations of IberSPEECH 2024},
92
+ author = {Christoforos Souganidis and Gemma Meseguer and Asier Herranz and Inma {Hernáez Rioja} and Eva Navas and Ibon Saratxaga},
93
+ year = {2024},
94
+ booktitle = {IberSPEECH 2024},
95
+ pages = {327--330},
96
+ doi = {10.21437/IberSPEECH.2024-68},
97
+ }
98
+ ````