metadata

tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
widget:
  - text: The Philosophical Enigma of Large Language Models
  - text: CONSTITUTIONAL AND LEGAL REGULATION OF THE STATE CIVIL SERVICE
  - text: Qashio and YallaCompare launch 'Qashio Insurance'
  - text: >-
      Online Travel Accommodations Market Report 2024 Reveals The Global Number
      Of Travel App Downloads Surpassed 3 Billion In 2023
  - text: >-
      The Procter & Gamble Company (NYSE:PG) Stock Position Decreased by
      CarsonAllaria Wealth Management Ltd.
metrics:
  - accuracy
pipeline_tag: text-classification
library_name: setfit
inference: false
base_model: OysterHR/gte-base-en-v1.5

SetFit with OysterHR/gte-base-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses OysterHR/gte-base-en-v1.5 as the Sentence Transformer embedding model. A OneVsRestClassifier instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: OysterHR/gte-base-en-v1.5
Classification head: a OneVsRestClassifier instance
Maximum Sequence Length: 8192 tokens

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("amplyfi/gte-base-en-v1.5_annotations_cache_aggregated_multilabel")
# Run inference
preds = model("The Philosophical Enigma of Large Language Models")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	3	11.0917	30

Training Hyperparameters

batch_size: (16, 2)
num_epochs: (10, 10)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 20
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0017	1	0.4182	-
0.0833	50	0.2867	-
0.1667	100	0.25	-
0.25	150	0.2203	-
0.3333	200	0.1984	-
0.4167	250	0.1759	-
0.5	300	0.1555	-
0.5833	350	0.1336	-
0.6667	400	0.1306	-
0.75	450	0.1245	-
0.8333	500	0.121	-
0.9167	550	0.1166	-
1.0	600	0.1139	-
1.0833	650	0.1083	-
1.1667	700	0.102	-
1.25	750	0.0965	-
1.3333	800	0.1027	-
1.4167	850	0.1045	-
1.5	900	0.1069	-
1.5833	950	0.0935	-
1.6667	1000	0.0929	-
1.75	1050	0.0875	-
1.8333	1100	0.0906	-
1.9167	1150	0.0999	-
2.0	1200	0.0974	-
2.0833	1250	0.0877	-
2.1667	1300	0.0776	-
2.25	1350	0.0839	-
2.3333	1400	0.0895	-
2.4167	1450	0.0819	-
2.5	1500	0.0819	-
2.5833	1550	0.0913	-
2.6667	1600	0.0881	-
2.75	1650	0.0921	-
2.8333	1700	0.0839	-
2.9167	1750	0.0851	-
3.0	1800	0.088	-
3.0833	1850	0.0801	-
3.1667	1900	0.086	-
3.25	1950	0.0831	-
3.3333	2000	0.0747	-
3.4167	2050	0.0773	-
3.5	2100	0.0832	-
3.5833	2150	0.078	-
3.6667	2200	0.0856	-
3.75	2250	0.0797	-
3.8333	2300	0.0759	-
3.9167	2350	0.0846	-
4.0	2400	0.0833	-
4.0833	2450	0.0767	-
4.1667	2500	0.0787	-
4.25	2550	0.0743	-
4.3333	2600	0.077	-
4.4167	2650	0.0808	-
4.5	2700	0.0768	-
4.5833	2750	0.0808	-
4.6667	2800	0.0796	-
4.75	2850	0.077	-
4.8333	2900	0.0787	-
4.9167	2950	0.071	-
5.0	3000	0.0773	-
5.0833	3050	0.069	-
5.1667	3100	0.0795	-
5.25	3150	0.0748	-
5.3333	3200	0.075	-
5.4167	3250	0.0745	-
5.5	3300	0.076	-
5.5833	3350	0.0708	-
5.6667	3400	0.0788	-
5.75	3450	0.0803	-
5.8333	3500	0.0756	-
5.9167	3550	0.0737	-
6.0	3600	0.073	-
6.0833	3650	0.066	-
6.1667	3700	0.0735	-
6.25	3750	0.0733	-
6.3333	3800	0.0754	-
6.4167	3850	0.0717	-
6.5	3900	0.0772	-
6.5833	3950	0.0695	-
6.6667	4000	0.0734	-
6.75	4050	0.0709	-
6.8333	4100	0.0776	-
6.9167	4150	0.073	-
7.0	4200	0.0732	-
7.0833	4250	0.069	-
7.1667	4300	0.0685	-
7.25	4350	0.0681	-
7.3333	4400	0.075	-
7.4167	4450	0.0751	-
7.5	4500	0.075	-
7.5833	4550	0.0686	-
7.6667	4600	0.07	-
7.75	4650	0.0716	-
7.8333	4700	0.0749	-
7.9167	4750	0.0687	-
8.0	4800	0.0753	-
8.0833	4850	0.0661	-
8.1667	4900	0.0662	-
8.25	4950	0.0725	-
8.3333	5000	0.0701	-
8.4167	5050	0.0702	-
8.5	5100	0.0755	-
8.5833	5150	0.0698	-
8.6667	5200	0.0686	-
8.75	5250	0.0659	-
8.8333	5300	0.0758	-
8.9167	5350	0.0702	-
9.0	5400	0.0721	-
9.0833	5450	0.071	-
9.1667	5500	0.0652	-
9.25	5550	0.0657	-
9.3333	5600	0.0742	-
9.4167	5650	0.0725	-
9.5	5700	0.066	-
9.5833	5750	0.068	-
9.6667	5800	0.0709	-
9.75	5850	0.0645	-
9.8333	5900	0.0669	-
9.9167	5950	0.0696	-
10.0	6000	0.0692	-

Framework Versions

Python: 3.10.12
SetFit: 1.1.1
Sentence Transformers: 3.3.1
Transformers: 4.48.0.dev0
PyTorch: 2.5.1+cu124
Datasets: 3.1.0
Tokenizers: 0.21.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}