Chess GPT-4.5M

Overview

Chess GPT-4.5M is a generative language model trained specifically to generate chess moves and analyze chess games. The model is based on the GPT architecture and was trained with a custom 32-token vocabulary reflecting key chess symbols and notations.

Model Details

Architecture: GPT-based language model (GPT2LMHeadModel)
Parameters: Approximately 4.5M parameters
Layers: 8 transformer layers
Heads: 4 attention heads per layer
Embedding Dimension: 256
Training Sequence Length: 1024 tokens per chess game
Vocabulary: 32 tokens (custom vocabulary)

Training Data

The model was trained on tokenized chess game data prepared from the Lichess dataset. The preparation process involved:

Tokenizing chess games using a custom 32-token vocabulary.
Creating binary training files (train.bin and val.bin).
Saving vocabulary information to meta.pkl.

Training Configuration

The training configuration, found in config/mac_chess_gpt.py, includes:

Dataset: lichess_hf_dataset
Batch Size: 2 (optimized for Mac's memory constraints)
Block Size: 1023 (1024 including the positional embedding)
Learning Rate: 3e-4
Max Iterations: 140,000
Device: 'mps' (Mac-specific settings)
Other Settings: No dropout and compile set to False for Mac compatibility

How to Use

Generating Chess Moves

After fine-tuning, use the generation script to sample chess moves. Example commands: bash Sample from the model without a provided prompt: python sample.py --out_dir=out-chess-mac Generate a chess game sequence starting with a custom prompt: python sample.py --out_dir=out-chess-mac --start=";1.e4"

Loading the Model in Transformers

Once the model card and converted model files are pushed to the Hugging Face Hub, you can load the model using:

python from transformers import GPT2LMHeadModel, GPT2Tokenizer model = GPT2LMHeadModel.from_pretrained("your-hf-username/chess-gpt-4.5M") tokenizer = GPT2Tokenizer.from_pretrained("your-hf-username/chess-gpt-4.5M")

Note: The tokenizer uses a custom vocabulary provided in vocab.json.

Intended Use

The model is intended for:

Generating chess move sequences.
Assisting in automated chess analysis.
Educational purposes in understanding language model training on specialized domains.

Limitations

The model is trained on a relatively small (4.5M parameter) architecture and may not capture extremely complex chess strategies.
It is specialized on chess move generation and may not generalize to standard language tasks.

Training Process Summary

Data Preparation: Tokenized the Lichess chess game dataset using a 32-token vocabulary.
Model Training: Used custom training configurations specified in config/mac_chess_gpt.py.
Model Conversion: Converted added checkpoint from out-chess-mac/ckpt.pt into a Hugging Face compatible format with convert_to_hf.py.
Repository Setup: Pushed the converted model files (including custom tokenizer vocab) to the Hugging Face Hub with Git LFS handling large files.

Acknowledgements

This model was developed following inspiration from GPT-2 and adapted for the chess domain.

derickio
/

chess-gpt-4.5M