ZeroNLG

Without any labeled downstream pairs for training, ZeroNLG is an unified framework that deals with multiple natural language generation (NLG) tasks in a zero-shot manner, including image-to-text, video-to-text, and text-to-text generation tasks across English, Chinese, German, and French.

Pre-trained data: a machine-translated version of CC3M, including

1.1M English sentences
1.1M English-Chinese pairs
1.1M English-German pairs
1.1M English-French pairs

Paper: ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

Authors: Bang Yang*, Fenglin Liu*, Yuexian Zou, Xian Wu, Yaowei Wang, David A. Clifton

Quick Start

Please follow our github repo to prepare the environment at first.

from zeronlg import ZeroNLG

# Automatically download the model from Huggingface Hub
# Note: this model is especially pre-trained for machine translation
model = ZeroNLG('zeronlg-4langs-mt')

# Translating English into Chinese
# Note: the multilingual encoder is langauge-agnostic, so the `lang` below means the langauge to be generated
output = model.forward_translate(texts='a girl and a boy are playing', lang='zh', num_beams=3)
# output = "一 个 女 孩 和 一 个 男 孩 一 起 玩"

Zero-Shot Performance

Machine translation

Model: zeronlg-4langs-mt only.

En->Zh	En<-Zh	En->De	En<-De	En->Fr	En<-Fr	Zh->De	Zh<-De	Zh->Fr	Zh<-Fr	De->Fr	De<-Fr
6.0	9.2	21.6	23.2	27.2	26.8	7.8	4.6	6.1	9.7	20.9	19.6

Citation

@article{Yang2023ZeroNLG,
   title={ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation},
   author={Yang, Bang and Liu, Fenglin and Zou, Yuexian and Wu, Xian and Wang, Yaowei and Clifton, David A.},
   journal={arXiv preprint arXiv:2303.06458}
   year={2023}
}