---
language:
  - en
tags:
  - chess
  - gpt
  - transformers
  - text-generation
license: mit
datasets:
  - lichess
pipeline_tag: text-generation
library_name: transformers
base_model: gpt2
---

# Chess GPT-4.5M

## Overview

Chess GPT-4.5M is a generative language model trained specifically to generate chess moves and analyze chess games. The model is based on the GPT architecture and was trained with a custom 32-token vocabulary reflecting key chess symbols and notations.

## Model Details

- **Architecture:** GPT-based language model (GPT2LMHeadModel)
- **Parameters:** Approximately 4.5M parameters
- **Layers:** 8 transformer layers
- **Heads:** 4 attention heads per layer
- **Embedding Dimension:** 256
- **Training Sequence Length:** 1024 tokens per chess game
- **Vocabulary:** 32 tokens (custom vocabulary)

## Training Data

The model was trained on tokenized chess game data prepared from the [Lichess dataset](https://huggingface.co/datasets/lichess). The preparation process involved:

- Tokenizing chess games using a custom 32-token vocabulary.
- Creating binary training files (`train.bin` and `val.bin`).
- Saving vocabulary information to `meta.pkl`.

## Training Configuration

The training configuration, found in `config/mac_chess_gpt.py`, includes:

- **Dataset:** lichess_hf_dataset
- **Batch Size:** 2 (optimized for Mac's memory constraints)
- **Block Size:** 1023 (1024 including the positional embedding)
- **Learning Rate:** 3e-4
- **Max Iterations:** 140,000
- **Device:** 'mps' (Mac-specific settings)
- **Other Settings:** No dropout and compile set to False for Mac compatibility

## How to Use

### Generating Chess Moves

After fine-tuning, use the generation script to sample chess moves. Example commands:
bash
Sample from the model without a provided prompt:
python sample.py --out_dir=out-chess-mac
Generate a chess game sequence starting with a custom prompt:
python sample.py --out_dir=out-chess-mac --start=";1.e4"

### Loading the Model in Transformers

Once the model card and converted model files are pushed to the Hugging Face Hub, you can load the model using:

python
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model = GPT2LMHeadModel.from_pretrained("your-hf-username/chess-gpt-4.5M")
tokenizer = GPT2Tokenizer.from_pretrained("your-hf-username/chess-gpt-4.5M")

_Note:_ The tokenizer uses a custom vocabulary provided in `vocab.json`.

## Intended Use

The model is intended for:

- Generating chess move sequences.
- Assisting in automated chess analysis.
- Educational purposes in understanding language model training on specialized domains.

## Limitations

- The model is trained on a relatively small (4.5M parameter) architecture and may not capture extremely complex chess strategies.
- It is specialized on chess move generation and may not generalize to standard language tasks.

## Training Process Summary

1. **Data Preparation:** Tokenized the Lichess chess game dataset using a 32-token vocabulary.
2. **Model Training:** Used custom training configurations specified in `config/mac_chess_gpt.py`.
3. **Model Conversion:** Converted added checkpoint from `out-chess-mac/ckpt.pt` into a Hugging Face compatible format with `convert_to_hf.py`.
4. **Repository Setup:** Pushed the converted model files (including custom tokenizer vocab) to the Hugging Face Hub with Git LFS handling large files.

## Acknowledgements

This model was developed following inspiration from [GPT-2](https://openai.com/blog/better-language-models/) and adapted for the chess domain.

---