Chess GPT-4.5M
Overview
Chess GPT-4.5M is a generative language model trained specifically to generate chess moves and analyze chess games. The model is based on the GPT architecture and was trained with a custom 32-token vocabulary reflecting key chess symbols and notations.
Model Details
- Architecture: GPT-based language model (GPT2LMHeadModel)
- Parameters: Approximately 4.5M parameters
- Layers: 8 transformer layers
- Heads: 4 attention heads per layer
- Embedding Dimension: 256
- Training Sequence Length: 1024 tokens per chess game
- Vocabulary: 32 tokens (custom vocabulary)
Training Data
The model was trained on tokenized chess game data prepared from the Lichess dataset. The preparation process involved:
- Tokenizing chess games using a custom 32-token vocabulary.
- Creating binary training files (
train.bin
andval.bin
). - Saving vocabulary information to
meta.pkl
.
Training Configuration
The training configuration, found in config/mac_chess_gpt.py
, includes:
- Dataset: lichess_hf_dataset
- Batch Size: 2 (optimized for Mac's memory constraints)
- Block Size: 1023 (1024 including the positional embedding)
- Learning Rate: 3e-4
- Max Iterations: 140,000
- Device: 'mps' (Mac-specific settings)
- Other Settings: No dropout and compile set to False for Mac compatibility
How to Use
Generating Chess Moves
After fine-tuning, use the generation script to sample chess moves. Example commands: bash Sample from the model without a provided prompt: python sample.py --out_dir=out-chess-mac Generate a chess game sequence starting with a custom prompt: python sample.py --out_dir=out-chess-mac --start=";1.e4"
Loading the Model in Transformers
Once the model card and converted model files are pushed to the Hugging Face Hub, you can load the model using:
python from transformers import GPT2LMHeadModel, GPT2Tokenizer model = GPT2LMHeadModel.from_pretrained("your-hf-username/chess-gpt-4.5M") tokenizer = GPT2Tokenizer.from_pretrained("your-hf-username/chess-gpt-4.5M")
Note: The tokenizer uses a custom vocabulary provided in vocab.json
.
Intended Use
The model is intended for:
- Generating chess move sequences.
- Assisting in automated chess analysis.
- Educational purposes in understanding language model training on specialized domains.
Limitations
- The model is trained on a relatively small (4.5M parameter) architecture and may not capture extremely complex chess strategies.
- It is specialized on chess move generation and may not generalize to standard language tasks.
Training Process Summary
- Data Preparation: Tokenized the Lichess chess game dataset using a 32-token vocabulary.
- Model Training: Used custom training configurations specified in
config/mac_chess_gpt.py
. - Model Conversion: Converted added checkpoint from
out-chess-mac/ckpt.pt
into a Hugging Face compatible format withconvert_to_hf.py
. - Repository Setup: Pushed the converted model files (including custom tokenizer vocab) to the Hugging Face Hub with Git LFS handling large files.
Acknowledgements
This model was developed following inspiration from GPT-2 and adapted for the chess domain.
- Downloads last month
- 21
Model tree for derickio/chess-gpt-4.5M
Base model
openai-community/gpt2