NeMo
Taejin commited on
Commit
d0be7b6
·
verified ·
1 Parent(s): 889281b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -2
README.md CHANGED
@@ -1,3 +1,24 @@
1
  ---
2
- {}
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: nemo
3
+ ---
4
+ # CHiME8 DASR NeMo Baseline Models
5
+
6
+ ## 1. Voice Activity Detection (VAD) Model:
7
+ ### **MarbleNet_frame_VAD_chime7_Acrobat.nemo**
8
+ - This model is based on [NeMo MarbleNet VAD model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/models.html#marblenet-vad).
9
+ - For validation, we use dataset comprises the CHiME-6 development subset as well as 50 hours of simulated audio data.
10
+ - The simulated data is generated using the [NeMo multi-speaker data simulator](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Multispeaker_Simulator.ipynb)
11
+ on [VoxCeleb1&2 datasets](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html)
12
+ - The multi-speaker data simulation results in a total of 2,000 hours of audio, of which approximately 30% is silence.
13
+ - The Model training incorporates [SpecAugment](https://arxiv.org/abs/1904.08779) and noise augmentation through [MUSAN noise dataset](https://arxiv.org/abs/1510.08484).
14
+
15
+
16
+ ## 2. Speaker Diarization Model: Multi-scale Diarization Decoder (MSDD-v2)
17
+ ### MSDD_v2_PALO_100ms_intrpl_3scales.nemo
18
+
19
+ ## 3. Automatic Speech Recognition (ASR) model
20
+ ### FastConformerXL-RNNT-chime7-GSS-finetuned.nemo
21
+
22
+
23
+ ## 4. Language Model for ASR Decoding: KenLM Model
24
+ ### ASR_LM_chime7_only.kenlm