|
---
|
|
language:
|
|
- en
|
|
license: apache-2.0
|
|
tags:
|
|
- speech
|
|
- self-supervised learning
|
|
- model compression
|
|
- neural architecture search
|
|
- LightHuBERT
|
|
datasets:
|
|
- librispeech_asr
|
|
- superb
|
|
---
|
|
|
|
# LightHuBERT
|
|
|
|
[**LightHuBERT**](https://arxiv.org/abs/2203.15610): **Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT**
|
|
|
|
Authors: Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko and Haizhou Li
|
|
|
|
| [**Github**](https://github.com/mechanicalsea/lighthubert) | [**Huggingface**](https://huggingface.co/mechanicalsea/lighthubert) |
|
|
|
|
The authors' PyTorch implementation and pre-trained models of LightHuBERT.
|
|
|
|
- March 2022: release preprint in [arXiv](https://arxiv.org/abs/2203.15610) and checkpoints in [huggingface](https://huggingface.co/mechanicalsea/lighthubert).
|
|
|
|
## Pre-Trained Models
|
|
|
|
| Model | Pre-Training Dataset | Download Link |
|
|
|---|---|---|
|
|
|LightHuBERT Base| [960 hrs LibriSpeech](http://www.openslr.org/12) | huggingface: [lighthubert/lighthubert_base.pt](https://huggingface.co/mechanicalsea/lighthubert/resolve/main/lighthubert_base.pt) |
|
|
|LightHuBERT Small| [960 hrs LibriSpeech](http://www.openslr.org/12) | huggingface: [lighthubert/lighthubert_small.pt](https://huggingface.co/mechanicalsea/lighthubert/resolve/main/lighthubert_small.pt) |
|
|
|LightHuBERT Stage 1| [960 hrs LibriSpeech](http://www.openslr.org/12) | huggingface: [lighthubert/lighthubert_stage1.pt](https://huggingface.co/mechanicalsea/lighthubert/resolve/main/lighthubert_stage1.pt) |
|
|
|
|
## Load Pre-Trained Models for Inference
|
|
|
|
```python
|
|
import torch
|
|
from lighthubert import LightHuBERT, LightHuBERTConfig
|
|
|
|
wav_input_16khz = torch.randn(1,10000).cuda()
|
|
|
|
# load the pre-trained checkpoints
|
|
checkpoint = torch.load('/path/to/lighthubert.pt')
|
|
cfg = LightHuBERTConfig(checkpoint['cfg']['model'])
|
|
cfg.supernet_type = 'base'
|
|
model = LightHuBERT(cfg)
|
|
model = model.cuda()
|
|
model = model.eval()
|
|
print(model.load_state_dict(checkpoint['model'], strict=False))
|
|
|
|
# (optional) set a subnet
|
|
subnet = model.supernet.sample_subnet()
|
|
model.set_sample_config(subnet)
|
|
params = model.calc_sampled_param_num()
|
|
print(f"subnet (Params {params / 1e6:.0f}M) | {subnet}")
|
|
|
|
# extract the the representation of last layer
|
|
rep = model.extract_features(wav_input_16khz)[0]
|
|
|
|
# extract the the representation of each layer
|
|
hs = model.extract_features(wav_input_16khz, ret_hs=True)[0]
|
|
|
|
print(f"Representation at bottom hidden states: {torch.allclose(rep, hs[-1])}")
|
|
```
|
|
|
|
### Profiling LightHuBERT
|
|
|
|
As mentioned in [Profiling Tool for SLT2022 SUPERB Challenge](https://github.com/B06901052/DeepSpeed/tree/superb-challenge), we profiling the `lighthubert` in s3prl.
|
|
|
|
```sh
|
|
cd DeepSpeed
|
|
# lighthubert_small
|
|
python testing/s3prl_profiling_test.py -u lighthubert_small --libri_root "libri_root"
|
|
# lighthubert_base
|
|
python testing/s3prl_profiling_test.py -u lighthubert_base --libri_root "libri_root"
|
|
# lighthubert_stage1
|
|
python testing/s3prl_profiling_test.py -u lighthubert_stage1 --libri_root "libri_root"
|
|
```
|
|
|
|
### Reference
|
|
|
|
If you find our work is useful in your research, please cite the following paper:
|
|
|
|
```bibtex
|
|
@article{wang2022lighthubert,
|
|
title={{LightHuBERT}: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit {BERT}},
|
|
author={Rui Wang and Qibing Bai and Junyi Ao and Long Zhou and Zhixiang Xiong and Zhihua Wei and Yu Zhang and Tom Ko and Haizhou Li},
|
|
journal={arXiv preprint arXiv:2203.15610},
|
|
year={2022}
|
|
}
|
|
```
|
|
|
|
### Contact Information
|
|
|
|
For help or issues using LightHuBERT models, please submit a GitHub issue.
|
|
|
|
For other communications related to LightHuBERT, please contact Rui Wang (`[email protected]`).
|
|
|