--- license: apache-2.0 base_model: - openai-community/gpt2 --- # Shakespeare Fine-Tuned GPT-2 Model ## Model Description This is a fine-tuned version of the GPT-2 language model trained on the [Tiny Shakespeare dataset](https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt). The model is optimized to generate text in the style of William Shakespeare, capturing the syntax, vocabulary, and poetic structure characteristic of his works. ## Intended Use The model is designed for educational purposes, creative writing, and experimentation with fine-tuned language models. Potential use cases include: - Generating Shakespearean-style text for creative projects. - Studying language modeling and fine-tuning techniques. - Providing inspiration for poetry or prose in Shakespearean English. ### Usage You can use this model via the Hugging Face Transformers library. Below is an example: ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Load model and tokenizer model_name = "msttftmk/shakespeare-gpt2" tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True) model = AutoModelForCausalLM.from_pretrained(model_name, use_auth_token=True) # Generate text input_text = "O gentle fair maiden," inputs = tokenizer.encode(input_text, return_tensors="pt") outputs = model.generate(inputs, max_length=100, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Training Details - **Base Model**: [GPT-2 (medium)](https://huggingface.co/gpt2-medium) - **Dataset**: Tiny Shakespeare dataset. - **Fine-Tuning Framework**: Hugging Face's `Trainer` API. - **Training Parameters**: - Learning rate: `2e-5` - Epochs: `3` - Batch size: `2` - Max sequence length: `128` --- ## Evaluation - **Validation Split**: 10% of the dataset. - **Evaluation Strategy**: Per epoch evaluation during training. - **Metrics**: Loss and perplexity on validation data. --- ## Limitations - **Style-Restricted**: The model generates text exclusively in a Shakespearean style. It is not intended for modern conversational or general-purpose language modeling. - **Biases**: The model inherits any biases present in the training dataset. - **Dataset Limitations**: The Tiny Shakespeare dataset is limited in size and scope, potentially restricting the richness and variability of the generated text. --- ## Ethical Considerations - The model should not be used for generating harmful, offensive, or misleading content. - Users should ensure proper attribution when using this model for creative projects. --- ## Citation If you use this model, please cite: ``` @misc{shakespeare-gpt2, author = {Mustafa Tomak}, title = {Shakespeare Fine-Tuned GPT-2}, year = {2025}, url = {https://huggingface.co/mstftmk/shakespeare-gpt2}, } ``` --- ## License The model is released under the apache-2.0. Users must comply with the terms of use.