---
license: mit
datasets:
- jatinmehra/Automated-Essay-Scoring-2.0
language:
- en
metrics:
- quadratic weighted kappa
base_model:
- HuggingFaceTB/SmolLM2-360M-Instruct
pipeline_tag: text-classification
new_version: jatinmehra/Smollm2-360M-Essay-Scoring
library_name: transformers
tags:
- Essay-Scoring
---


# Model Card for Essay Scoring Model by Jatin Mehra

## Quick Summary

This model is trained by **Jatin Mehra** for a Kaggle competition focused on automated essay scoring, using data provided by the competition.

-   **Base Model:** smollm2 360M instruct
-   **Hardware:** Kaggle’s 2×T4 GPUs
-   **Test Scores:** 79 quadratic weighted kappa
-   **Libraries Used:** transformers, torch, pandas (for preprocessing)
-   Fine-tuned for scoring essays on a 0-5 scale.
-   Easily adaptable for other subjective text scoring tasks.

----------

## Model Details

### Model Description

-   **Developed by:** Jatin Mehra
-   **Shared by:** Jatin Mehra
-   **Model type:** Instruction-based NLP Model for Essay Scoring
-   **Language(s) (NLP):** English
-   **License:** MIT
-   **Finetuned from model:** smollm2 360M instruct

### Model Sources

-   **Repository:** [Training NoteBook](https://github.com/Jatin-Mehra119/Essay-Scoring-Modeling/blob/main/Research%20Notebooks/essay-smollm2-360m.ipynb)
-   **Demo:**  [Essay Scorer Pro](https://huggingface.co/spaces/jatinmehra/Essay-Scorer-Pro)

----------

## Uses

### Direct Use

This model can be directly used for scoring essays based on the competition dataset.

### Downstream Use

Potential uses include adapting the model for other text evaluation tasks like grading subjective responses or evaluating content for quality and relevance.

### Out-of-Scope Use

The model is not designed for creative writing evaluation, detecting plagiarism, or scoring in languages other than English.

----------

## Bias, Risks, and Limitations

-   The model’s performance is optimized for the Kaggle dataset and may not generalize well to other datasets.
-   Potential biases may arise from the training data, such as over-representation or under-representation of certain essay topics or writing styles.

### Recommendations

Users should validate the model’s performance on their specific datasets before deployment. Consider retraining or fine-tuning if applied to different domains.

----------

## How to Get Started with the Model

#### Helper Guide to Use the Pre-trained SmollM2 360M Essay Scoring Model for Essay Scoring

##### Step 1: Install Required Libraries
To ensure compatibility with the model, we need to install specific versions of the transformers and torch libraries.

```
pip install transformers==4.46.3 torch==2.4.0
```

##### Step 2: Load the Pre-trained Model and Tokenizer

```
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("jatinmehra/Smollm2-360M-Essay-Scoring")
model = AutoModelForSequenceClassification.from_pretrained("jatinmehra/Smollm2-360M-Essay-Scoring")
```

##### Step 3: Preprocess the Essay Text
Before passing the essay to the model for scoring, we need to preprocess it by handling encoding issues, removing unnecessary spaces, and normalizing the text. This ensures that the input text is clean and in a format suitable for scoring.

```
import re
from text_unidecode import unidecode

# Preprocessing Functions
def resolve_encodings_and_normalize(text: str) -> str:
    """Resolve encoding problems and normalize abnormal characters."""
    text = (
        text.encode("raw_unicode_escape")
        .decode("utf-8", errors="replace_decoding_with_cp1252")
        .encode("cp1252", errors="replace_encoding_with_utf8")
        .decode("utf-8", errors="replace_decoding_with_cp1252")
    )
    text = unidecode(text)  # Convert accented characters to ASCII
    return text

def preprocess_essay_text(text: str) -> str:
    """
    Prepares essay text for scoring by cleaning non-essential issues without altering quality indicators.
    """
    text = resolve_encodings_and_normalize(text)
    text = re.sub(r'\s+', ' ', text.strip())  # Normalize whitespace
    text = re.sub(r'\s+([?.!,"])', r'\1', text)  # Remove spaces before punctuation
    text = re.sub(r',([^\s])', r', \1', text)    # Add space after commas
    return text
```

##### Step 4: Prediction Function¶
A function that takes an essay as input and predicts the score. The model returns a score on a scale of 0 to 5, which we adjust to a scale of 1 to 6.

```
# Prediction function
def predict_score(text: str) -> int:
    # Preprocess the text
    processed_text = preprocess_essay_text(text)

    # Tokenize the input text
    encoding = tokenizer(
        processed_text,
        padding='max_length',
        truncation=True,
        max_length=512,
        return_tensors='pt'
    )

    # Get input IDs and attention mask
    input_ids = encoding['input_ids'].squeeze(0).unsqueeze(0)  # Add batch dimension
    attention_mask = encoding['attention_mask'].squeeze(0).unsqueeze(0)  # Add batch dimension

    # Move tensors to device
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    input_ids = input_ids.to(device)
    attention_mask = attention_mask.to(device)

    # Perform inference
    model.eval()
    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        prediction = torch.argmax(logits, dim=-1).cpu().numpy()

    # Convert prediction to score (adjust based on your scoring range)
    score = prediction[0] + 1  # Scores range from 1 to 6 | Model predicts from 0 to 5.
    return score
```

##### Step 5: Example Usage with Manual Input¶
Now you can input your essay text manually and get the predicted score.

```
manual_input = input("Enter your text/Essay:")
predicted_score = predict_score(manual_input)
print(f"Predicted Score: {predicted_score}")
```

### Training

The model was trained using the following steps:

1.  **Data Preparation**:
    
    -   The training data consists of essays scored on a 1–6 scale, offset to labels in the range of 0–5.
    -   Preprocessing involved:
        -   Resolving encoding issues using custom encoding and decoding handlers.
        -   Normalizing whitespace and punctuation formatting without altering grammar or spelling.
    -   Texts were tokenized using `AutoTokenizer` from the `transformers` library.
2.  **Class Imbalance Handling**:
    
    -   Class weights were computed based on the inverse frequency of each label in the training dataset to mitigate class imbalance during training.
3.  **Training Configuration**:
    
    -   The HuggingFace model checkpoint `HuggingFaceTB/SmolLM2-360M-Instruct` was fine-tuned with an added classification head for essay scoring.
    -   Loss function: Weighted Cross-Entropy Loss.
    -   Optimizer: `AdamW` with a learning rate of `2e-5`.
    -   Scheduler: Linear schedule with warm-up (100 steps).
    -   Gradient Clipping: Maximum gradient norm of `1.0` to prevent exploding gradients.
    -   Mixed precision (`fp16`) training was used for faster training and reduced memory consumption.
4.  **Training Process**:
    
    -   The dataset was split into training (85%) and validation (15%) sets, stratified by label.
    -   Training was performed over **3 epochs** with a batch size of 4 for training and 2 for validation.
    -   The `Trainer` API was utilized with an early stopping callback (patience of 2 epochs).
    -   Metric for evaluation: Quadratic Weighted Kappa (QWK), optimized using the validation set.
5.  **Results**:
    
    -   The model achieved a QWK score of **0.7927** on the validation set in the second epoch, considered the best performance.
6.  **Model Saving**:
    
    -   The fine-tuned model and tokenizer were saved for inference and further use.


## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

The Kaggle dataset’s test split was used for evaluation.

#### Factors

Evaluation considers coherence, grammar, and content relevance in the essays.

#### Metrics

The primary evaluation metric was the quadratic weighted kappa, achieving a score of **79**.

### Results

The model performed well on the test set, but generalization to other datasets requires validation.

----------

## Technical Specifications

### Model Architecture and Objective

The model is based on the smollm2 360M instruct architecture, fine-tuned for essay scoring.

### Compute Infrastructure

#### Hardware

-   **GPUs:** 2×T4 GPUs
-   **Hours used:** 3 Hours 24 Minutes

#### Software

-   **Libraries:** Transformers, Pytorch, Pandas, Numpy

### Citation
```
@misc {jatin_mehra_2024,
	author       = { {Jatin Mehra} },
	title        = { Smollm2-360M-Essay-Scoring (Revision 467ceb5) },
	year         = 2024,
	url          = { https://huggingface.co/jatinmehra/Smollm2-360M-Essay-Scoring },
	doi          = { 10.57967/hf/3924 },
	publisher    = { Hugging Face }
}

@misc{learning-agency-lab-automated-essay-scoring-2,
    author = {Scott Crossley and Perpetual Baffour and Jules King and Lauryn Burleigh and Walter Reade and Maggie Demkin},
    title = {Learning Agency Lab - Automated Essay Scoring 2.0},
    year = {2024},
    howpublished = {\url{https://kaggle.com/competitions/learning-agency-lab-automated-essay-scoring-2}},
    note = {Kaggle}
}

@misc{allal2024SmolLM2,
      title={SmolLM2 - with great data, comes great performance}, 
      author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Lewis Tunstall and Agustín Piqueres and Andres Marafioti and Cyril Zakka and Leandro von Werra and Thomas Wolf},
      year={2024},
}
```