ModernBert-Frugal-AI

This is a version of https://huggingface.co/answerdotai/ModernBERT-base that was finetuned using the Frugal-AI-Train-Data-88k Dataset.

Hyper parameters for reproduction
per_device_train_batch_size=16
per_device_eval_batch_size=16,
num_train_epochs=2,
warmup_steps=500,
weight_decay=0.01,
learning_rate=2e-6,
lr_scheduler_type="cosine",
fp16=True

Model trained without the test set managed to reach from 0.74 to 0.75 on different runs. The final version submitted also incorporated samples from the public test set in the training set.

Other attempted methods

What we also attempted within the challenge :

TF-IDF trained RandomForest, Logistic Regression or XGBoost : Performances were at around 0.54 on test set without overfitting

Modernbert-base embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were ranging from 0.62 to 0.7 on test set without overfitting

Modernbert-base embeddings + TF-IDF embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were also ranging from 0.62 to 0.7 on test set without overfitting

A Voting Classifier from Modernbert-base embeddings trained Logistic Regression, SGDClassifier, SVC and XGBoost : Performances were around 0.7 without overfitting

Distlibert or DEBERTA-Large : Lower performances were registered, from around 0.7 to 0.72 without overfitting

Qwen2.5-3B-Instruct LoRA finetuned for Sequence Classification, quantized to 8 bits : Higher performances at around 0.78 without overfitting, but much higher Carbon Footprint. Model is stored at Qwen2.5-3B-FrugalAI

Overfitting always yielded (obviously) high performance increase.

Our choice in the end was to submit both the ModernBert and Qwen models, trained on the whole uploaded dataset.

MatthiasPicard
/

ModernBERT_frugal_88k

ModernBert-Frugal-AI

Spaces using MatthiasPicard/ModernBERT_frugal_88k 3