Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,24 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
library_name: nemo
|
3 |
+
---
|
4 |
+
# CHiME8 DASR NeMo Baseline Models
|
5 |
+
|
6 |
+
## 1. Voice Activity Detection (VAD) Model:
|
7 |
+
### **MarbleNet_frame_VAD_chime7_Acrobat.nemo**
|
8 |
+
- This model is based on [NeMo MarbleNet VAD model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/models.html#marblenet-vad).
|
9 |
+
- For validation, we use dataset comprises the CHiME-6 development subset as well as 50 hours of simulated audio data.
|
10 |
+
- The simulated data is generated using the [NeMo multi-speaker data simulator](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Multispeaker_Simulator.ipynb)
|
11 |
+
on [VoxCeleb1&2 datasets](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html)
|
12 |
+
- The multi-speaker data simulation results in a total of 2,000 hours of audio, of which approximately 30% is silence.
|
13 |
+
- The Model training incorporates [SpecAugment](https://arxiv.org/abs/1904.08779) and noise augmentation through [MUSAN noise dataset](https://arxiv.org/abs/1510.08484).
|
14 |
+
|
15 |
+
|
16 |
+
## 2. Speaker Diarization Model: Multi-scale Diarization Decoder (MSDD-v2)
|
17 |
+
### MSDD_v2_PALO_100ms_intrpl_3scales.nemo
|
18 |
+
|
19 |
+
## 3. Automatic Speech Recognition (ASR) model
|
20 |
+
### FastConformerXL-RNNT-chime7-GSS-finetuned.nemo
|
21 |
+
|
22 |
+
|
23 |
+
## 4. Language Model for ASR Decoding: KenLM Model
|
24 |
+
### ASR_LM_chime7_only.kenlm
|