--- license: mit datasets: - jatinmehra/Automated-Essay-Scoring-2.0 language: - en metrics: - quadratic weighted kappa base_model: - HuggingFaceTB/SmolLM2-360M-Instruct pipeline_tag: text-classification new_version: jatinmehra/Smollm2-360M-Essay-Scoring library_name: transformers tags: - Essay-Scoring --- # Model Card for Essay Scoring Model by Jatin Mehra ## Quick Summary This model is trained by **Jatin Mehra** for a Kaggle competition focused on automated essay scoring, using data provided by the competition. - **Base Model:** smollm2 360M instruct - **Hardware:** Kaggle’s 2×T4 GPUs - **Test Scores:** 79 quadratic weighted kappa - **Libraries Used:** transformers, torch, pandas (for preprocessing) - Fine-tuned for scoring essays on a 0-5 scale. - Easily adaptable for other subjective text scoring tasks. ---------- ## Model Details ### Model Description - **Developed by:** Jatin Mehra - **Shared by:** Jatin Mehra - **Model type:** Instruction-based NLP Model for Essay Scoring - **Language(s) (NLP):** English - **License:** MIT - **Finetuned from model:** smollm2 360M instruct ### Model Sources - **Repository:** [Training NoteBook](https://github.com/Jatin-Mehra119/Essay-Scoring-Modeling/blob/main/Research%20Notebooks/essay-smollm2-360m.ipynb) - **Demo:** [Essay Scorer Pro](https://huggingface.co/spaces/jatinmehra/Essay-Scorer-Pro) ---------- ## Uses ### Direct Use This model can be directly used for scoring essays based on the competition dataset. ### Downstream Use Potential uses include adapting the model for other text evaluation tasks like grading subjective responses or evaluating content for quality and relevance. ### Out-of-Scope Use The model is not designed for creative writing evaluation, detecting plagiarism, or scoring in languages other than English. ---------- ## Bias, Risks, and Limitations - The model’s performance is optimized for the Kaggle dataset and may not generalize well to other datasets. - Potential biases may arise from the training data, such as over-representation or under-representation of certain essay topics or writing styles. ### Recommendations Users should validate the model’s performance on their specific datasets before deployment. Consider retraining or fine-tuning if applied to different domains. ---------- ## How to Get Started with the Model #### Helper Guide to Use the Pre-trained SmollM2 360M Essay Scoring Model for Essay Scoring ##### Step 1: Install Required Libraries To ensure compatibility with the model, we need to install specific versions of the transformers and torch libraries. ``` pip install transformers==4.46.3 torch==2.4.0 ``` ##### Step 2: Load the Pre-trained Model and Tokenizer ``` # Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("jatinmehra/Smollm2-360M-Essay-Scoring") model = AutoModelForSequenceClassification.from_pretrained("jatinmehra/Smollm2-360M-Essay-Scoring") ``` ##### Step 3: Preprocess the Essay Text Before passing the essay to the model for scoring, we need to preprocess it by handling encoding issues, removing unnecessary spaces, and normalizing the text. This ensures that the input text is clean and in a format suitable for scoring. ``` import re from text_unidecode import unidecode # Preprocessing Functions def resolve_encodings_and_normalize(text: str) -> str: """Resolve encoding problems and normalize abnormal characters.""" text = ( text.encode("raw_unicode_escape") .decode("utf-8", errors="replace_decoding_with_cp1252") .encode("cp1252", errors="replace_encoding_with_utf8") .decode("utf-8", errors="replace_decoding_with_cp1252") ) text = unidecode(text) # Convert accented characters to ASCII return text def preprocess_essay_text(text: str) -> str: """ Prepares essay text for scoring by cleaning non-essential issues without altering quality indicators. """ text = resolve_encodings_and_normalize(text) text = re.sub(r'\s+', ' ', text.strip()) # Normalize whitespace text = re.sub(r'\s+([?.!,"])', r'\1', text) # Remove spaces before punctuation text = re.sub(r',([^\s])', r', \1', text) # Add space after commas return text ``` ##### Step 4: Prediction Function¶ A function that takes an essay as input and predicts the score. The model returns a score on a scale of 0 to 5, which we adjust to a scale of 1 to 6. ``` # Prediction function def predict_score(text: str) -> int: # Preprocess the text processed_text = preprocess_essay_text(text) # Tokenize the input text encoding = tokenizer( processed_text, padding='max_length', truncation=True, max_length=512, return_tensors='pt' ) # Get input IDs and attention mask input_ids = encoding['input_ids'].squeeze(0).unsqueeze(0) # Add batch dimension attention_mask = encoding['attention_mask'].squeeze(0).unsqueeze(0) # Add batch dimension # Move tensors to device device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) input_ids = input_ids.to(device) attention_mask = attention_mask.to(device) # Perform inference model.eval() with torch.no_grad(): outputs = model(input_ids=input_ids, attention_mask=attention_mask) logits = outputs.logits prediction = torch.argmax(logits, dim=-1).cpu().numpy() # Convert prediction to score (adjust based on your scoring range) score = prediction[0] + 1 # Scores range from 1 to 6 | Model predicts from 0 to 5. return score ``` ##### Step 5: Example Usage with Manual Input¶ Now you can input your essay text manually and get the predicted score. ``` manual_input = input("Enter your text/Essay:") predicted_score = predict_score(manual_input) print(f"Predicted Score: {predicted_score}") ``` ### Training The model was trained using the following steps: 1. **Data Preparation**: - The training data consists of essays scored on a 1–6 scale, offset to labels in the range of 0–5. - Preprocessing involved: - Resolving encoding issues using custom encoding and decoding handlers. - Normalizing whitespace and punctuation formatting without altering grammar or spelling. - Texts were tokenized using `AutoTokenizer` from the `transformers` library. 2. **Class Imbalance Handling**: - Class weights were computed based on the inverse frequency of each label in the training dataset to mitigate class imbalance during training. 3. **Training Configuration**: - The HuggingFace model checkpoint `HuggingFaceTB/SmolLM2-360M-Instruct` was fine-tuned with an added classification head for essay scoring. - Loss function: Weighted Cross-Entropy Loss. - Optimizer: `AdamW` with a learning rate of `2e-5`. - Scheduler: Linear schedule with warm-up (100 steps). - Gradient Clipping: Maximum gradient norm of `1.0` to prevent exploding gradients. - Mixed precision (`fp16`) training was used for faster training and reduced memory consumption. 4. **Training Process**: - The dataset was split into training (85%) and validation (15%) sets, stratified by label. - Training was performed over **3 epochs** with a batch size of 4 for training and 2 for validation. - The `Trainer` API was utilized with an early stopping callback (patience of 2 epochs). - Metric for evaluation: Quadratic Weighted Kappa (QWK), optimized using the validation set. 5. **Results**: - The model achieved a QWK score of **0.7927** on the validation set in the second epoch, considered the best performance. 6. **Model Saving**: - The fine-tuned model and tokenizer were saved for inference and further use. ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The Kaggle dataset’s test split was used for evaluation. #### Factors Evaluation considers coherence, grammar, and content relevance in the essays. #### Metrics The primary evaluation metric was the quadratic weighted kappa, achieving a score of **79**. ### Results The model performed well on the test set, but generalization to other datasets requires validation. ---------- ## Technical Specifications ### Model Architecture and Objective The model is based on the smollm2 360M instruct architecture, fine-tuned for essay scoring. ### Compute Infrastructure #### Hardware - **GPUs:** 2×T4 GPUs - **Hours used:** 3 Hours 24 Minutes #### Software - **Libraries:** Transformers, Pytorch, Pandas, Numpy ### Citation ``` @misc {jatin_mehra_2024, author = { {Jatin Mehra} }, title = { Smollm2-360M-Essay-Scoring (Revision 467ceb5) }, year = 2024, url = { https://huggingface.co/jatinmehra/Smollm2-360M-Essay-Scoring }, doi = { 10.57967/hf/3924 }, publisher = { Hugging Face } } @misc{learning-agency-lab-automated-essay-scoring-2, author = {Scott Crossley and Perpetual Baffour and Jules King and Lauryn Burleigh and Walter Reade and Maggie Demkin}, title = {Learning Agency Lab - Automated Essay Scoring 2.0}, year = {2024}, howpublished = {\url{https://kaggle.com/competitions/learning-agency-lab-automated-essay-scoring-2}}, note = {Kaggle} } @misc{allal2024SmolLM2, title={SmolLM2 - with great data, comes great performance}, author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Lewis Tunstall and Agustín Piqueres and Andres Marafioti and Cyril Zakka and Leandro von Werra and Thomas Wolf}, year={2024}, } ```