lighthubert / README.md

add lighthubert profiling results

e7f3407 over 2 years ago

3.79 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- speech
	- self-supervised learning
	- model compression
	- neural architecture search
	- LightHuBERT
	datasets:
	- librispeech_asr
	- superb
	---

	# LightHuBERT

	[LightHuBERT](https://arxiv.org/abs/2203.15610): Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

	Authors: Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko and Haizhou Li

	\| [Github](https://github.com/mechanicalsea/lighthubert) \| [Huggingface](https://huggingface.co/mechanicalsea/lighthubert) \|

	The authors' PyTorch implementation and pre-trained models of LightHuBERT.

	- March 2022: release preprint in [arXiv](https://arxiv.org/abs/2203.15610) and checkpoints in [huggingface](https://huggingface.co/mechanicalsea/lighthubert).

	## Pre-Trained Models

	\| Model \| Pre-Training Dataset \| Download Link \|
	\|---\|---\|---\|
	\|LightHuBERT Base\| [960 hrs LibriSpeech](http://www.openslr.org/12) \| huggingface: [lighthubert/lighthubert_base.pt](https://huggingface.co/mechanicalsea/lighthubert/resolve/main/lighthubert_base.pt) \|
	\|LightHuBERT Small\| [960 hrs LibriSpeech](http://www.openslr.org/12) \| huggingface: [lighthubert/lighthubert_small.pt](https://huggingface.co/mechanicalsea/lighthubert/resolve/main/lighthubert_small.pt) \|
	\|LightHuBERT Stage 1\| [960 hrs LibriSpeech](http://www.openslr.org/12) \| huggingface: [lighthubert/lighthubert_stage1.pt](https://huggingface.co/mechanicalsea/lighthubert/resolve/main/lighthubert_stage1.pt) \|

	## Load Pre-Trained Models for Inference

	```python
	import torch
	from lighthubert import LightHuBERT, LightHuBERTConfig

	wav_input_16khz = torch.randn(1,10000).cuda()

	# load the pre-trained checkpoints
	checkpoint = torch.load('/path/to/lighthubert.pt')
	cfg = LightHuBERTConfig(checkpoint['cfg']['model'])
	cfg.supernet_type = 'base'
	model = LightHuBERT(cfg)
	model = model.cuda()
	model = model.eval()
	print(model.load_state_dict(checkpoint['model'], strict=False))

	# (optional) set a subnet
	subnet = model.supernet.sample_subnet()
	model.set_sample_config(subnet)
	params = model.calc_sampled_param_num()
	print(f"subnet (Params {params / 1e6:.0f}M) \| {subnet}")

	# extract the the representation of last layer
	rep = model.extract_features(wav_input_16khz)[0]

	# extract the the representation of each layer
	hs = model.extract_features(wav_input_16khz, ret_hs=True)[0]

	print(f"Representation at bottom hidden states: {torch.allclose(rep, hs[-1])}")
	```

	### Profiling LightHuBERT

	As mentioned in [Profiling Tool for SLT2022 SUPERB Challenge](https://github.com/B06901052/DeepSpeed/tree/superb-challenge), we profiling the `lighthubert` in s3prl.

	```sh
	cd DeepSpeed
	# lighthubert_small
	python testing/s3prl_profiling_test.py -u lighthubert_small --libri_root "libri_root"
	# lighthubert_base
	python testing/s3prl_profiling_test.py -u lighthubert_base --libri_root "libri_root"
	# lighthubert_stage1
	python testing/s3prl_profiling_test.py -u lighthubert_stage1 --libri_root "libri_root"
	```

	### Reference

	If you find our work is useful in your research, please cite the following paper:

	```bibtex
	@article{wang2022lighthubert,
	title={{LightHuBERT}: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit {BERT}},
	author={Rui Wang and Qibing Bai and Junyi Ao and Long Zhou and Zhixiang Xiong and Zhihua Wei and Yu Zhang and Tom Ko and Haizhou Li},
	journal={arXiv preprint arXiv:2203.15610},
	year={2022}
	}
	```

	### Contact Information

	For help or issues using LightHuBERT models, please submit a GitHub issue.

	For other communications related to LightHuBERT, please contact Rui Wang (`[email protected]`).