Model Card for Essay Scoring Model by Jatin Mehra
Quick Summary
This model is trained by Jatin Mehra for a Kaggle competition focused on automated essay scoring, using data provided by the competition.
- Base Model: smollm2 360M instruct
- Hardware: Kaggle’s 2×T4 GPUs
- Test Scores: 79 quadratic weighted kappa
- Libraries Used: transformers, torch, pandas (for preprocessing)
- Fine-tuned for scoring essays on a 0-5 scale.
- Easily adaptable for other subjective text scoring tasks.
Model Details
Model Description
- Developed by: Jatin Mehra
- Shared by: Jatin Mehra
- Model type: Instruction-based NLP Model for Essay Scoring
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: smollm2 360M instruct
Model Sources
- Repository: Training NoteBook
- Demo: Essay Scorer Pro
Uses
Direct Use
This model can be directly used for scoring essays based on the competition dataset.
Downstream Use
Potential uses include adapting the model for other text evaluation tasks like grading subjective responses or evaluating content for quality and relevance.
Out-of-Scope Use
The model is not designed for creative writing evaluation, detecting plagiarism, or scoring in languages other than English.
Bias, Risks, and Limitations
- The model’s performance is optimized for the Kaggle dataset and may not generalize well to other datasets.
- Potential biases may arise from the training data, such as over-representation or under-representation of certain essay topics or writing styles.
Recommendations
Users should validate the model’s performance on their specific datasets before deployment. Consider retraining or fine-tuning if applied to different domains.
How to Get Started with the Model
Helper Guide to Use the Pre-trained SmollM2 360M Essay Scoring Model for Essay Scoring
Step 1: Install Required Libraries
To ensure compatibility with the model, we need to install specific versions of the transformers and torch libraries.
pip install transformers==4.46.3 torch==2.4.0
Step 2: Load the Pre-trained Model and Tokenizer
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("jatinmehra/Smollm2-360M-Essay-Scoring")
model = AutoModelForSequenceClassification.from_pretrained("jatinmehra/Smollm2-360M-Essay-Scoring")
Step 3: Preprocess the Essay Text
Before passing the essay to the model for scoring, we need to preprocess it by handling encoding issues, removing unnecessary spaces, and normalizing the text. This ensures that the input text is clean and in a format suitable for scoring.
import re
from text_unidecode import unidecode
# Preprocessing Functions
def resolve_encodings_and_normalize(text: str) -> str:
"""Resolve encoding problems and normalize abnormal characters."""
text = (
text.encode("raw_unicode_escape")
.decode("utf-8", errors="replace_decoding_with_cp1252")
.encode("cp1252", errors="replace_encoding_with_utf8")
.decode("utf-8", errors="replace_decoding_with_cp1252")
)
text = unidecode(text) # Convert accented characters to ASCII
return text
def preprocess_essay_text(text: str) -> str:
"""
Prepares essay text for scoring by cleaning non-essential issues without altering quality indicators.
"""
text = resolve_encodings_and_normalize(text)
text = re.sub(r'\s+', ' ', text.strip()) # Normalize whitespace
text = re.sub(r'\s+([?.!,"])', r'\1', text) # Remove spaces before punctuation
text = re.sub(r',([^\s])', r', \1', text) # Add space after commas
return text
Step 4: Prediction Function¶
A function that takes an essay as input and predicts the score. The model returns a score on a scale of 0 to 5, which we adjust to a scale of 1 to 6.
# Prediction function
def predict_score(text: str) -> int:
# Preprocess the text
processed_text = preprocess_essay_text(text)
# Tokenize the input text
encoding = tokenizer(
processed_text,
padding='max_length',
truncation=True,
max_length=512,
return_tensors='pt'
)
# Get input IDs and attention mask
input_ids = encoding['input_ids'].squeeze(0).unsqueeze(0) # Add batch dimension
attention_mask = encoding['attention_mask'].squeeze(0).unsqueeze(0) # Add batch dimension
# Move tensors to device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
input_ids = input_ids.to(device)
attention_mask = attention_mask.to(device)
# Perform inference
model.eval()
with torch.no_grad():
outputs = model(input_ids=input_ids, attention_mask=attention_mask)
logits = outputs.logits
prediction = torch.argmax(logits, dim=-1).cpu().numpy()
# Convert prediction to score (adjust based on your scoring range)
score = prediction[0] + 1 # Scores range from 1 to 6 | Model predicts from 0 to 5.
return score
Step 5: Example Usage with Manual Input¶
Now you can input your essay text manually and get the predicted score.
manual_input = input("Enter your text/Essay:")
predicted_score = predict_score(manual_input)
print(f"Predicted Score: {predicted_score}")
Training
The model was trained using the following steps:
Data Preparation:
- The training data consists of essays scored on a 1–6 scale, offset to labels in the range of 0–5.
- Preprocessing involved:
- Resolving encoding issues using custom encoding and decoding handlers.
- Normalizing whitespace and punctuation formatting without altering grammar or spelling.
- Texts were tokenized using
AutoTokenizer
from thetransformers
library.
Class Imbalance Handling:
- Class weights were computed based on the inverse frequency of each label in the training dataset to mitigate class imbalance during training.
Training Configuration:
- The HuggingFace model checkpoint
HuggingFaceTB/SmolLM2-360M-Instruct
was fine-tuned with an added classification head for essay scoring. - Loss function: Weighted Cross-Entropy Loss.
- Optimizer:
AdamW
with a learning rate of2e-5
. - Scheduler: Linear schedule with warm-up (100 steps).
- Gradient Clipping: Maximum gradient norm of
1.0
to prevent exploding gradients. - Mixed precision (
fp16
) training was used for faster training and reduced memory consumption.
- The HuggingFace model checkpoint
Training Process:
- The dataset was split into training (85%) and validation (15%) sets, stratified by label.
- Training was performed over 3 epochs with a batch size of 4 for training and 2 for validation.
- The
Trainer
API was utilized with an early stopping callback (patience of 2 epochs). - Metric for evaluation: Quadratic Weighted Kappa (QWK), optimized using the validation set.
Results:
- The model achieved a QWK score of 0.7927 on the validation set in the second epoch, considered the best performance.
Model Saving:
- The fine-tuned model and tokenizer were saved for inference and further use.
Evaluation
Testing Data, Factors & Metrics
Testing Data
The Kaggle dataset’s test split was used for evaluation.
Factors
Evaluation considers coherence, grammar, and content relevance in the essays.
Metrics
The primary evaluation metric was the quadratic weighted kappa, achieving a score of 79.
Results
The model performed well on the test set, but generalization to other datasets requires validation.
Technical Specifications
Model Architecture and Objective
The model is based on the smollm2 360M instruct architecture, fine-tuned for essay scoring.
Compute Infrastructure
Hardware
- GPUs: 2×T4 GPUs
- Hours used: 3 Hours 24 Minutes
Software
- Libraries: Transformers, Pytorch, Pandas, Numpy
Citation
@misc {jatin_mehra_2024,
author = { {Jatin Mehra} },
title = { Smollm2-360M-Essay-Scoring (Revision 467ceb5) },
year = 2024,
url = { https://huggingface.co/jatinmehra/Smollm2-360M-Essay-Scoring },
doi = { 10.57967/hf/3924 },
publisher = { Hugging Face }
}
@misc{learning-agency-lab-automated-essay-scoring-2,
author = {Scott Crossley and Perpetual Baffour and Jules King and Lauryn Burleigh and Walter Reade and Maggie Demkin},
title = {Learning Agency Lab - Automated Essay Scoring 2.0},
year = {2024},
howpublished = {\url{https://kaggle.com/competitions/learning-agency-lab-automated-essay-scoring-2}},
note = {Kaggle}
}
@misc{allal2024SmolLM2,
title={SmolLM2 - with great data, comes great performance},
author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Lewis Tunstall and Agustín Piqueres and Andres Marafioti and Cyril Zakka and Leandro von Werra and Thomas Wolf},
year={2024},
}
- Downloads last month
- 124
Model tree for jatinmehra/Smollm2-360M-Essay-Scoring
Base model
HuggingFaceTB/SmolLM2-360M-Instruct