Hervé BREDIN
commited on
Commit
·
fa95f91
1
Parent(s):
89e7168
doc: update README
Browse files
README.md
CHANGED
@@ -20,19 +20,23 @@ datasets:
|
|
20 |
- repere
|
21 |
- voxceleb
|
22 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
23 |
---
|
24 |
|
25 |
# 🎹 Speaker diarization
|
26 |
|
27 |
-
Relies on pyannote.audio 2.0: see [installation instructions](https://github.com/pyannote/pyannote-audio
|
28 |
-
|
29 |
|
30 |
## TL;DR
|
31 |
|
32 |
```python
|
33 |
# load the pipeline from Hugginface Hub
|
34 |
from pyannote.audio import Pipeline
|
35 |
-
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@
|
36 |
|
37 |
# apply the pipeline to an audio file
|
38 |
diarization = pipeline("audio.wav")
|
@@ -89,15 +93,15 @@ Processing is fully automatic:
|
|
89 |
* evaluation of overlapped speech
|
90 |
|
91 |
|
92 |
-
| Benchmark
|
93 |
-
| ---------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------- | ---------------------------------- | ----------------------------------- |
|
94 |
-
| [AISHELL-4](http://www.openslr.org/111/) | 14.61 | 3.31 | 4.35 | 6.95 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.rttm)
|
95 |
-
| [AMI *Mix-Headset*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 18.21 | 3.28 | 11.07 | 3.87 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.rttm)
|
96 |
-
| [AMI *Array1-01*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 29.00 | 2.71 | 21.61 | 4.68 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.rttm)
|
97 |
-
| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) [*Part2*](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1) | 30.24 | 3.71 | 16.86 | 9.66 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.rttm)
|
98 |
-
| [DIHARD 3 *Full*](https://arxiv.org/abs/2012.01477) | 20.99 | 4.25 | 10.74 | 6.00 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.rttm)
|
99 |
-
| [REPERE *Phase 2*](https://islrn.org/resources/360-758-359-485-0/) | 12.62 | 1.55 | 3.30 | 7.76 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.rttm)
|
100 |
-
| [VoxConverse *v0.3*](https://github.com/joonson/voxconverse)
|
101 |
|
102 |
## Support
|
103 |
|
|
|
20 |
- repere
|
21 |
- voxceleb
|
22 |
license: mit
|
23 |
+
extra_gated_prompt: "The collected information will help acquire a better knowledge of pyannote.audio userbase and help its maintainers apply for grants to improve it further. If you are an academic researcher, please cite the relevant papers in your own publications using the model. If you work for a company, please consider contributing back to pyannote.audio development (e.g. through unrestricted gifts). We also provide scientific consulting services around speaker diarization and machine listening."
|
24 |
+
extra_gated_fields:
|
25 |
+
Company/university: text
|
26 |
+
Website: text
|
27 |
+
I plan to use this model for (task, type of audio data, etc): text
|
28 |
---
|
29 |
|
30 |
# 🎹 Speaker diarization
|
31 |
|
32 |
+
Relies on pyannote.audio 2.0.1: see [installation instructions](https://github.com/pyannote/pyannote-audio#installation).
|
|
|
33 |
|
34 |
## TL;DR
|
35 |
|
36 |
```python
|
37 |
# load the pipeline from Hugginface Hub
|
38 |
from pyannote.audio import Pipeline
|
39 |
+
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.0.1")
|
40 |
|
41 |
# apply the pipeline to an audio file
|
42 |
diarization = pipeline("audio.wav")
|
|
|
93 |
* evaluation of overlapped speech
|
94 |
|
95 |
|
96 |
+
| Benchmark (2.0.1) | [DER%](. "Diarization error rate") | [FA%](. "False alarm rate") | [Miss%](. "Missed detection rate") | [Conf%](. "Speaker confusion rate") | Expected output | File-level evaluation |
|
97 |
+
| ---------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | --------------------------- | ---------------------------------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
98 |
+
| [AISHELL-4](http://www.openslr.org/111/) | 14.61 | 3.31 | 4.35 | 6.95 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AISHELL.SpeakerDiarization.Full.test.eval) |
|
99 |
+
| [AMI *Mix-Headset*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 18.21 | 3.28 | 11.07 | 3.87 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI.SpeakerDiarization.only_words.test.eval) |
|
100 |
+
| [AMI *Array1-01*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 29.00 | 2.71 | 21.61 | 4.68 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/AMI-SDM.SpeakerDiarization.only_words.test.eval) |
|
101 |
+
| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) [*Part2*](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1) | 30.24 | 3.71 | 16.86 | 9.66 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/CALLHOME.SpeakerDiarization.CALLHOME.test.eval) |
|
102 |
+
| [DIHARD 3 *Full*](https://arxiv.org/abs/2012.01477) | 20.99 | 4.25 | 10.74 | 6.00 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/DIHARD.SpeakerDiarization.Full.test.eval) |
|
103 |
+
| [REPERE *Phase 2*](https://islrn.org/resources/360-758-359-485-0/) | 12.62 | 1.55 | 3.30 | 7.76 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/2022.07/reproducible_research/2022.07/REPERE.SpeakerDiarization.Full.test.eval) |
|
104 |
+
| [VoxConverse *v0.3*](https://github.com/joonson/voxconverse) | 12.61 | 3.45 | 3.85 | 5.31 | [RTTM](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.rttm) | [eval](https://huggingface.co/pyannote/speaker-diarization/blob/main/reproducible_research/2022.07/VoxConverse.SpeakerDiarization.VoxConverse.test.eval) |
|
105 |
|
106 |
## Support
|
107 |
|