--- language: - en tags: - chess - gpt - transformers - text-generation license: mit datasets: - lichess pipeline_tag: text-generation library_name: transformers base_model: gpt2 --- # Chess GPT-4.5M ## Overview Chess GPT-4.5M is a generative language model trained specifically to generate chess moves and analyze chess games. The model is based on the GPT architecture and was trained with a custom 32-token vocabulary reflecting key chess symbols and notations. ## Model Details - **Architecture:** GPT-based language model (GPT2LMHeadModel) - **Parameters:** Approximately 4.5M parameters - **Layers:** 8 transformer layers - **Heads:** 4 attention heads per layer - **Embedding Dimension:** 256 - **Training Sequence Length:** 1024 tokens per chess game - **Vocabulary:** 32 tokens (custom vocabulary) ## Training Data The model was trained on tokenized chess game data prepared from the [Lichess dataset](https://huggingface.co/datasets/lichess). The preparation process involved: - Tokenizing chess games using a custom 32-token vocabulary. - Creating binary training files (`train.bin` and `val.bin`). - Saving vocabulary information to `meta.pkl`. ## Training Configuration The training configuration, found in `config/mac_chess_gpt.py`, includes: - **Dataset:** lichess_hf_dataset - **Batch Size:** 2 (optimized for Mac's memory constraints) - **Block Size:** 1023 (1024 including the positional embedding) - **Learning Rate:** 3e-4 - **Max Iterations:** 140,000 - **Device:** 'mps' (Mac-specific settings) - **Other Settings:** No dropout and compile set to False for Mac compatibility ## How to Use ### Generating Chess Moves After fine-tuning, use the generation script to sample chess moves. Example commands: bash Sample from the model without a provided prompt: python sample.py --out_dir=out-chess-mac Generate a chess game sequence starting with a custom prompt: python sample.py --out_dir=out-chess-mac --start=";1.e4" ### Loading the Model in Transformers Once the model card and converted model files are pushed to the Hugging Face Hub, you can load the model using: python from transformers import GPT2LMHeadModel, GPT2Tokenizer model = GPT2LMHeadModel.from_pretrained("your-hf-username/chess-gpt-4.5M") tokenizer = GPT2Tokenizer.from_pretrained("your-hf-username/chess-gpt-4.5M") _Note:_ The tokenizer uses a custom vocabulary provided in `vocab.json`. ## Intended Use The model is intended for: - Generating chess move sequences. - Assisting in automated chess analysis. - Educational purposes in understanding language model training on specialized domains. ## Limitations - The model is trained on a relatively small (4.5M parameter) architecture and may not capture extremely complex chess strategies. - It is specialized on chess move generation and may not generalize to standard language tasks. ## Training Process Summary 1. **Data Preparation:** Tokenized the Lichess chess game dataset using a 32-token vocabulary. 2. **Model Training:** Used custom training configurations specified in `config/mac_chess_gpt.py`. 3. **Model Conversion:** Converted added checkpoint from `out-chess-mac/ckpt.pt` into a Hugging Face compatible format with `convert_to_hf.py`. 4. **Repository Setup:** Pushed the converted model files (including custom tokenizer vocab) to the Hugging Face Hub with Git LFS handling large files. ## Acknowledgements This model was developed following inspiration from [GPT-2](https://openai.com/blog/better-language-models/) and adapted for the chess domain. ---