YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ModernBert-Frugal-AI

This is a version of https://huggingface.co/answerdotai/ModernBERT-base that was finetuned using the Frugal-AI-Train-Data-88k Dataset.

  • Hyper parameters for reproduction
  • per_device_train_batch_size=16
  • per_device_eval_batch_size=16,
  • num_train_epochs=2,
  • warmup_steps=500,
  • weight_decay=0.01,
  • learning_rate=2e-6,
  • lr_scheduler_type="cosine",
  • fp16=True

Model trained without the test set managed to reach from 0.74 to 0.75 on different runs. The final version submitted also incorporated samples from the public test set in the training set.

Other attempted methods

What we also attempted within the challenge :

TF-IDF trained RandomForest, Logistic Regression or XGBoost : Performances were at around 0.54 on test set without overfitting

Modernbert-base embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were ranging from 0.62 to 0.7 on test set without overfitting

Modernbert-base embeddings + TF-IDF embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were also ranging from 0.62 to 0.7 on test set without overfitting

A Voting Classifier from Modernbert-base embeddings trained Logistic Regression, SGDClassifier, SVC and XGBoost : Performances were around 0.7 without overfitting

Distlibert or DEBERTA-Large : Lower performances were registered, from around 0.7 to 0.72 without overfitting

Qwen2.5-3B-Instruct LoRA finetuned for Sequence Classification, quantized to 8 bits : Higher performances at around 0.78 without overfitting, but much higher Carbon Footprint. Model is stored at Qwen2.5-3B-FrugalAI

Overfitting always yielded (obviously) high performance increase.

Our choice in the end was to submit both the ModernBert and Qwen models, trained on the whole uploaded dataset.

Downloads last month
5
Safetensors
Model size
150M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Spaces using MatthiasPicard/ModernBERT_frugal_88k 3