ModernBert-Frugal-AI
This is a version of https://huggingface.co/answerdotai/ModernBERT-base that was finetuned using the Frugal-AI-Train-Data-88k Dataset.
- Hyper parameters for reproduction
- per_device_train_batch_size=16
- per_device_eval_batch_size=16,
- num_train_epochs=2,
- warmup_steps=500,
- weight_decay=0.01,
- learning_rate=2e-6,
- lr_scheduler_type="cosine",
- fp16=True
Model trained without the test set managed to reach from 0.74 to 0.75 on different runs. The final version submitted also incorporated samples from the public test set in the training set.
Other attempted methods
What we also attempted within the challenge :
TF-IDF trained RandomForest, Logistic Regression or XGBoost : Performances were at around 0.54 on test set without overfitting
Modernbert-base embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were ranging from 0.62 to 0.7 on test set without overfitting
Modernbert-base embeddings + TF-IDF embeddings trained RandomForest, Logistic Regression or XGBoost : Performances were also ranging from 0.62 to 0.7 on test set without overfitting
A Voting Classifier from Modernbert-base embeddings trained Logistic Regression, SGDClassifier, SVC and XGBoost : Performances were around 0.7 without overfitting
Distlibert or DEBERTA-Large : Lower performances were registered, from around 0.7 to 0.72 without overfitting
Qwen2.5-3B-Instruct LoRA finetuned for Sequence Classification, quantized to 8 bits : Higher performances at around 0.78 without overfitting, but much higher Carbon Footprint. Model is stored at Qwen2.5-3B-FrugalAI
Overfitting always yielded (obviously) high performance increase.
Our choice in the end was to submit both the ModernBert and Qwen models, trained on the whole uploaded dataset.
- Downloads last month
- 5