English

Hybrid RetNet

This is a hybrid architecture between self-attention based Transformer and RetNet, where only the 2nd and middle layer is multi-head attention, and otherwise RetNet.

This is the model weight accompanying the paper Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers, in which new Linear-Cost Inference models (e.g. RetNet) are not trained from scratch but transfer shared weight components from other PTLMs. The model's input/output embeddings, MLP weights, & Layer Norms has been transferred from pythia-1B. For more detail, please refer to the paper.

Model Details

Model Description

  • Developed by: NucleusAI, Sehyun Choi
  • Model type: RetNet & Transformer Hybrid

Model Sources

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("cuda")

model = AutoModelForCausalLM.from_pretrained("NucleusAI/RetNet-1B-Hybrid-XATL", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("NucleusAI/RetNet-1B-Hybrid-XATL", trust_remote_code=True)  # same as EleutherAI/pythia-1B

inputs = tokenizer("Hi there!", return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

Training Data

The model has been trained with pile_dedup dataset, in favor of comparison with the same sized pythia models.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train NucleusAI/RetNet-1B-Hybrid-XATL