bobox's picture
AdaptiveLayerLoss(model=model,
15f6f24 verified
|
raw
history blame
17.7 kB
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:314315
  - loss:AdaptiveLayerLoss
  - loss:MultipleNegativesRankingLoss
base_model: microsoft/deberta-v3-small
datasets:
  - stanfordnlp/snli
metrics:
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
widget:
  - source_sentence: A man plays the violin.
    sentences:
      - A man is playing violin.
      - The back of a pig under a tree with a cow in the background.
      - The plane is getting ready to take off.
  - source_sentence: A person drops a camera down an escelator.
    sentences:
      - Something is bothering your cat and he does not like it.
      - A man tosses a bag down an escalator.
      - Two smiling women holding a baby.
  - source_sentence: One football player tries to tackle a player on the opposing team.
    sentences:
      - I think Stephen King's comments are helpful in this regard.
      - Our interactions are merely depends on where we put our perception.
      - A football player attempts a tackle.
  - source_sentence: The two men are wearing jeans.
    sentences:
      - Four people eating dessert around a table.
      - >-
        Here are some things that worked with my son who started toilet training
        around 2.5 years.
      - The two men are wearing pants.
  - source_sentence: >-
      This may be overly obvious, but in American English, saying "you're
      welcome" is certainly polite and standard.
    sentences:
      - I'm not sure how "Not at all" sounds in response to "thank you".
      - >-
        As bikeboy389 said, you can learn a lot by looking at students' native
        languages.
      - A laptop and a PC at a workstation.
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy
            value: 0.5397679884752445
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.9089176654815674
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6834040429248815
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.3752323389053345
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.5191082802547771
            name: Cosine Precision
          - type: cosine_recall
            value: 0.9998539506353147
            name: Cosine Recall
          - type: cosine_ap
            value: 0.5794582374804604
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.5302903935097429
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 391.4422302246094
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6834040429248815
            name: Dot F1
          - type: dot_f1_threshold
            value: 175.07894897460938
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.5191082802547771
            name: Dot Precision
          - type: dot_recall
            value: 0.9998539506353147
            name: Dot Recall
          - type: dot_ap
            value: 0.5621671154600225
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.5644855561452726
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 160.045654296875
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6834381551362683
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 322.75946044921875
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.5191476454083567
            name: Manhattan Precision
          - type: manhattan_recall
            value: 0.9998539506353147
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.6033119142961784
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.5387064978391084
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 8.973075866699219
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6834065495207667
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 24.51708221435547
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.5191505498672734
            name: Euclidean Precision
          - type: euclidean_recall
            value: 0.9997079012706295
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.577277049262529
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.5644855561452726
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 391.4422302246094
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6834381551362683
            name: Max F1
          - type: max_f1_threshold
            value: 322.75946044921875
            name: Max F1 Threshold
          - type: max_precision
            value: 0.5191505498672734
            name: Max Precision
          - type: max_recall
            value: 0.9998539506353147
            name: Max Recall
          - type: max_ap
            value: 0.6033119142961784
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small on the stanfordnlp/snli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/deberta-v3-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTaV3-small-ST-AdaptiveLayers-ep2")
# Run inference
sentences = [
    'This may be overly obvious, but in American English, saying "you\'re welcome" is certainly polite and standard.',
    'I\'m not sure how "Not at all" sounds in response to "thank you".',
    "As bikeboy389 said, you can learn a lot by looking at students' native languages.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.5398
cosine_accuracy_threshold 0.9089
cosine_f1 0.6834
cosine_f1_threshold 0.3752
cosine_precision 0.5191
cosine_recall 0.9999
cosine_ap 0.5795
dot_accuracy 0.5303
dot_accuracy_threshold 391.4422
dot_f1 0.6834
dot_f1_threshold 175.0789
dot_precision 0.5191
dot_recall 0.9999
dot_ap 0.5622
manhattan_accuracy 0.5645
manhattan_accuracy_threshold 160.0457
manhattan_f1 0.6834
manhattan_f1_threshold 322.7595
manhattan_precision 0.5191
manhattan_recall 0.9999
manhattan_ap 0.6033
euclidean_accuracy 0.5387
euclidean_accuracy_threshold 8.9731
euclidean_f1 0.6834
euclidean_f1_threshold 24.5171
euclidean_precision 0.5192
euclidean_recall 0.9997
euclidean_ap 0.5773
max_accuracy 0.5645
max_accuracy_threshold 391.4422
max_f1 0.6834
max_f1_threshold 322.7595
max_precision 0.5192
max_recall 0.9999
max_ap 0.6033

Training Details

Training Dataset

stanfordnlp/snli

  • Dataset: stanfordnlp/snli at cdb5c3d
  • Size: 314,315 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 5 tokens
    • mean: 16.62 tokens
    • max: 62 tokens
    • min: 4 tokens
    • mean: 9.46 tokens
    • max: 29 tokens
    • 0: 100.00%
  • Samples:
    sentence1 sentence2 label
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. 0
    Children smiling and waving at camera There are children present 0
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. 0
  • Loss: AdaptiveLayerLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1,
        "prior_layers_weight": 1,
        "kl_div_weight": 1,
        "kl_temperature": 1
    }
    

Evaluation Dataset

stanfordnlp/snli

  • Dataset: stanfordnlp/snli at cdb5c3d
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 14.77 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 14.74 tokens
    • max: 49 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: AdaptiveLayerLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1,
        "prior_layers_weight": 1,
        "kl_div_weight": 1,
        "kl_temperature": 1
    }
    

Training Logs

Epoch Step loss max_ap
None 0 4.6204 0.6033

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2
  • Accelerate: 0.30.1
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

AdaptiveLayerLoss

@misc{li20242d,
    title={2D Matryoshka Sentence Embeddings}, 
    author={Xianming Li and Zongxi Li and Jing Li and Haoran Xie and Qing Li},
    year={2024},
    eprint={2402.14776},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}