Model Card for Essay Scoring Model by Jatin Mehra

Quick Summary

This model is trained by Jatin Mehra for a Kaggle competition focused on automated essay scoring, using data provided by the competition.

  • Base Model: smollm2 360M instruct
  • Hardware: Kaggle’s 2×T4 GPUs
  • Test Scores: 79 quadratic weighted kappa
  • Libraries Used: transformers, torch, pandas (for preprocessing)
  • Fine-tuned for scoring essays on a 0-5 scale.
  • Easily adaptable for other subjective text scoring tasks.

Model Details

Model Description

  • Developed by: Jatin Mehra
  • Shared by: Jatin Mehra
  • Model type: Instruction-based NLP Model for Essay Scoring
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: smollm2 360M instruct

Model Sources


Uses

Direct Use

This model can be directly used for scoring essays based on the competition dataset.

Downstream Use

Potential uses include adapting the model for other text evaluation tasks like grading subjective responses or evaluating content for quality and relevance.

Out-of-Scope Use

The model is not designed for creative writing evaluation, detecting plagiarism, or scoring in languages other than English.


Bias, Risks, and Limitations

  • The model’s performance is optimized for the Kaggle dataset and may not generalize well to other datasets.
  • Potential biases may arise from the training data, such as over-representation or under-representation of certain essay topics or writing styles.

Recommendations

Users should validate the model’s performance on their specific datasets before deployment. Consider retraining or fine-tuning if applied to different domains.


How to Get Started with the Model

Helper Guide to Use the Pre-trained SmollM2 360M Essay Scoring Model for Essay Scoring

Step 1: Install Required Libraries

To ensure compatibility with the model, we need to install specific versions of the transformers and torch libraries.

pip install transformers==4.46.3 torch==2.4.0
Step 2: Load the Pre-trained Model and Tokenizer
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("jatinmehra/Smollm2-360M-Essay-Scoring")
model = AutoModelForSequenceClassification.from_pretrained("jatinmehra/Smollm2-360M-Essay-Scoring")
Step 3: Preprocess the Essay Text

Before passing the essay to the model for scoring, we need to preprocess it by handling encoding issues, removing unnecessary spaces, and normalizing the text. This ensures that the input text is clean and in a format suitable for scoring.

import re
from text_unidecode import unidecode

# Preprocessing Functions
def resolve_encodings_and_normalize(text: str) -> str:
    """Resolve encoding problems and normalize abnormal characters."""
    text = (
        text.encode("raw_unicode_escape")
        .decode("utf-8", errors="replace_decoding_with_cp1252")
        .encode("cp1252", errors="replace_encoding_with_utf8")
        .decode("utf-8", errors="replace_decoding_with_cp1252")
    )
    text = unidecode(text)  # Convert accented characters to ASCII
    return text

def preprocess_essay_text(text: str) -> str:
    """
    Prepares essay text for scoring by cleaning non-essential issues without altering quality indicators.
    """
    text = resolve_encodings_and_normalize(text)
    text = re.sub(r'\s+', ' ', text.strip())  # Normalize whitespace
    text = re.sub(r'\s+([?.!,"])', r'\1', text)  # Remove spaces before punctuation
    text = re.sub(r',([^\s])', r', \1', text)    # Add space after commas
    return text
Step 4: Prediction Function¶

A function that takes an essay as input and predicts the score. The model returns a score on a scale of 0 to 5, which we adjust to a scale of 1 to 6.

# Prediction function
def predict_score(text: str) -> int:
    # Preprocess the text
    processed_text = preprocess_essay_text(text)

    # Tokenize the input text
    encoding = tokenizer(
        processed_text,
        padding='max_length',
        truncation=True,
        max_length=512,
        return_tensors='pt'
    )

    # Get input IDs and attention mask
    input_ids = encoding['input_ids'].squeeze(0).unsqueeze(0)  # Add batch dimension
    attention_mask = encoding['attention_mask'].squeeze(0).unsqueeze(0)  # Add batch dimension

    # Move tensors to device
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    input_ids = input_ids.to(device)
    attention_mask = attention_mask.to(device)

    # Perform inference
    model.eval()
    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        prediction = torch.argmax(logits, dim=-1).cpu().numpy()

    # Convert prediction to score (adjust based on your scoring range)
    score = prediction[0] + 1  # Scores range from 1 to 6 | Model predicts from 0 to 5.
    return score
Step 5: Example Usage with Manual Input¶

Now you can input your essay text manually and get the predicted score.

manual_input = input("Enter your text/Essay:")
predicted_score = predict_score(manual_input)
print(f"Predicted Score: {predicted_score}")

Training

The model was trained using the following steps:

  1. Data Preparation:

    • The training data consists of essays scored on a 1–6 scale, offset to labels in the range of 0–5.
    • Preprocessing involved:
      • Resolving encoding issues using custom encoding and decoding handlers.
      • Normalizing whitespace and punctuation formatting without altering grammar or spelling.
    • Texts were tokenized using AutoTokenizer from the transformers library.
  2. Class Imbalance Handling:

    • Class weights were computed based on the inverse frequency of each label in the training dataset to mitigate class imbalance during training.
  3. Training Configuration:

    • The HuggingFace model checkpoint HuggingFaceTB/SmolLM2-360M-Instruct was fine-tuned with an added classification head for essay scoring.
    • Loss function: Weighted Cross-Entropy Loss.
    • Optimizer: AdamW with a learning rate of 2e-5.
    • Scheduler: Linear schedule with warm-up (100 steps).
    • Gradient Clipping: Maximum gradient norm of 1.0 to prevent exploding gradients.
    • Mixed precision (fp16) training was used for faster training and reduced memory consumption.
  4. Training Process:

    • The dataset was split into training (85%) and validation (15%) sets, stratified by label.
    • Training was performed over 3 epochs with a batch size of 4 for training and 2 for validation.
    • The Trainer API was utilized with an early stopping callback (patience of 2 epochs).
    • Metric for evaluation: Quadratic Weighted Kappa (QWK), optimized using the validation set.
  5. Results:

    • The model achieved a QWK score of 0.7927 on the validation set in the second epoch, considered the best performance.
  6. Model Saving:

    • The fine-tuned model and tokenizer were saved for inference and further use.

Evaluation

Testing Data, Factors & Metrics

Testing Data

The Kaggle dataset’s test split was used for evaluation.

Factors

Evaluation considers coherence, grammar, and content relevance in the essays.

Metrics

The primary evaluation metric was the quadratic weighted kappa, achieving a score of 79.

Results

The model performed well on the test set, but generalization to other datasets requires validation.


Technical Specifications

Model Architecture and Objective

The model is based on the smollm2 360M instruct architecture, fine-tuned for essay scoring.

Compute Infrastructure

Hardware

  • GPUs: 2×T4 GPUs
  • Hours used: 3 Hours 24 Minutes

Software

  • Libraries: Transformers, Pytorch, Pandas, Numpy

Citation

@misc {jatin_mehra_2024,
    author       = { {Jatin Mehra} },
    title        = { Smollm2-360M-Essay-Scoring (Revision 467ceb5) },
    year         = 2024,
    url          = { https://huggingface.co/jatinmehra/Smollm2-360M-Essay-Scoring },
    doi          = { 10.57967/hf/3924 },
    publisher    = { Hugging Face }
}

@misc{learning-agency-lab-automated-essay-scoring-2,
    author = {Scott Crossley and Perpetual Baffour and Jules King and Lauryn Burleigh and Walter Reade and Maggie Demkin},
    title = {Learning Agency Lab - Automated Essay Scoring 2.0},
    year = {2024},
    howpublished = {\url{https://kaggle.com/competitions/learning-agency-lab-automated-essay-scoring-2}},
    note = {Kaggle}
}

@misc{allal2024SmolLM2,
      title={SmolLM2 - with great data, comes great performance}, 
      author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Lewis Tunstall and Agustín Piqueres and Andres Marafioti and Cyril Zakka and Leandro von Werra and Thomas Wolf},
      year={2024},
}
Downloads last month
124
Safetensors
Model size
362M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for jatinmehra/Smollm2-360M-Essay-Scoring

Finetuned
(29)
this model

Dataset used to train jatinmehra/Smollm2-360M-Essay-Scoring

Space using jatinmehra/Smollm2-360M-Essay-Scoring 1