hicustomer
/

pyannote-speaker-diarization

@@ -20,19 +20,23 @@ datasets:
 - repere
 - voxceleb
 license: mit
 ---
 # 🎹 Speaker diarization
-Relies on pyannote.audio 2.0: see [installation instructions](https://github.com/pyannote/pyannote-audio/tree/develop#installation).
 ## TL;DR
 ```python
 # load the pipeline from Hugginface Hub
 from pyannote.audio import Pipeline
-pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2022.07")
 # apply the pipeline to an audio file
 diarization = pipeline("audio.wav")
@@ -89,15 +93,15 @@ Processing is fully automatic:
 * evaluation of overlapped speech
-| Benchmark                                                                                                                          | [DER%](. "Diarization error rate") | [FA%](. "False alarm rate") | [Miss%](. "Missed detection rate") | [Conf%](. "Speaker confusion rate") | Expected output                                                                            | File-level evaluation                                                                      |
-| ---------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------- | ---------------------------------- | ----------------------------------- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------ |
-| [AISHELL-4](http://www.openslr.org/111/)                                                                                           | 14.61                              | 3.31                        | 4.35                               | 6.95                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.rttm)                    | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.eval)                    |
-| [AMI *Mix-Headset*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 18.21                              | 3.28                        | 11.07                              | 3.87                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.rttm)          | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.eval)          |
-| [AMI *Array1-01*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup)   | 29.00                              | 2.71                        | 21.61                              | 4.68                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.rttm)      | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.eval)      |
-| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) [*Part2*](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)         | 30.24                              | 3.71                        | 16.86                              | 9.66                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.rttm)       | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.eval)       |
-| [DIHARD 3 *Full*](https://arxiv.org/abs/2012.01477)                                                                                | 20.99                              | 4.25                        | 10.74                              | 6.00                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.rttm)             | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.eval)             |
-| [REPERE *Phase 2*](https://islrn.org/resources/360-758-359-485-0/)                                                                 | 12.62                              | 1.55                        | 3.30                               | 7.76                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.rttm)             | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.eval)             |
-| [VoxConverse *v0.3*](https://github.com/joonson/voxconverse)                                                                     | 12.61                              | 3.45                        | 3.85                               | 5.31                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.eval) |
 ## Support

 - repere
 - voxceleb
 license: mit
+extra_gated_prompt: "The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening."
+extra_gated_fields:
+  Company/university: text
+  Website: text
+  I plan to use this model for (task, type of audio data, etc): text
 ---
 # 🎹 Speaker diarization
+Relies on pyannote.audio 2.0.1: see [installation instructions](https://github.com/pyannote/pyannote-audio#installation).
 ## TL;DR
 ```python
 # load the pipeline from Hugginface Hub
 from pyannote.audio import Pipeline
+pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.0.1")
 # apply the pipeline to an audio file
 diarization = pipeline("audio.wav")
 * evaluation of overlapped speech
+| Benchmark (2.0.1)                                                                                                                  | [DER%](. "Diarization error rate") | [FA%](. "False alarm rate") | [Miss%](. "Missed detection rate") | [Conf%](. "Speaker confusion rate") | Expected output                                                                                                                                          | File-level evaluation                                                                                                                                    |
+| ---------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------- | ---------------------------------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| [AISHELL-4](http://www.openslr.org/111/)                                                                                           | 14.61                              | 3.31                        | 4.35                               | 6.95                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.rttm)         | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.eval)         |
+| [AMI *Mix-Headset*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 18.21                              | 3.28                        | 11.07                              | 3.87                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.rttm)       | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.eval)       |
+| [AMI *Array1-01*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup)   | 29.00                              | 2.71                        | 21.61                              | 4.68                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.rttm)   | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.eval)   |
+| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) [*Part2*](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)         | 30.24                              | 3.71                        | 16.86                              | 9.66                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.rttm)    | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.eval)    |
+| [DIHARD 3 *Full*](https://arxiv.org/abs/2012.01477)                                                                                | 20.99                              | 4.25                        | 10.74                              | 6.00                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.rttm)          | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.eval)          |
+| [REPERE *Phase 2*](https://islrn.org/resources/360-758-359-485-0/)                                                                 | 12.62                              | 1.55                        | 3.30                               | 7.76                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.rttm)          | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.eval)          |
+| [VoxConverse *v0.3*](https://github.com/joonson/voxconverse)                                                                       | 12.61                              | 3.45                        | 3.85                               | 5.31                                | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.eval) |
 ## Support