(!) Don't forget to preprocess unknown_tokens and substitute them with <|endoftext|>. Otherwise the <unk> tokens in dataset will be split into the '<', 'unk' and '>' tokens

Full context (1024) perplexity on test set: 13.68

Dependence of the cross entropy loss on the length of the context for prediction

x-axis*128 = context length
y-axis = cross entropy

Downloads last month: 168

Safetensors

Model size

124M params

Tensor type

F32

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

irodkin
/

gpt2-wiki103

Dataset used to train irodkin/gpt2-wiki103