Hybrid RetNet
This is a hybrid architecture between self-attention based Transformer and RetNet, where only the 2nd and middle layer is multi-head attention, and otherwise RetNet.
This is the model weight accompanying the paper Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers, in which new Linear-Cost Inference models (e.g. RetNet) are not trained from scratch but transfer shared weight components from other PTLMs. The model's input/output embeddings, MLP weights, & Layer Norms has been transferred from pythia-1B. For more detail, please refer to the paper.
Model Details
Model Description
- Developed by: NucleusAI, Sehyun Choi
- Model type: RetNet & Transformer Hybrid
Model Sources
- Repository: RetNet-XATL
- Paper: Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers
How to Get Started with the Model
Use the code below to get started with the model.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("NucleusAI/RetNet-1B-Hybrid-XATL", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("NucleusAI/RetNet-1B-Hybrid-XATL", trust_remote_code=True) # same as EleutherAI/pythia-1B
inputs = tokenizer("Hi there!", return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)
Training Data
The model has been trained with pile_dedup dataset, in favor of comparison with the same sized pythia models.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.