SentenceTransformer based on sentence-transformers/LaBSE

This is a sentence-transformers model finetuned from sentence-transformers/LaBSE. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/LaBSE
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("codersan/FaLaBSE-v6")
# Run inference
sentences = [
    'آیا با دختری که باکره نیست ازدواج خواهید کرد؟',
    'آیا با کسی که باکره نیست ازدواج می کنید؟',
    'زنی با شلوار جین کنار اسبی با زین ایستاده است',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 149,098 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 5 tokens
    • mean: 15.1 tokens
    • max: 76 tokens
    • min: 5 tokens
    • mean: 14.54 tokens
    • max: 57 tokens
  • Samples:
    anchor positive
    اگر هند تقسیم نشده بود ، هند امروز چگونه به نظر می رسد؟ اگر پارتیشن اتفاق نیفتاد ، هند امروز چگونه خواهد بود؟
    چگونه می توانم وارد امنیت اینترنت شوم؟ چگونه می توانم شروع به یادگیری امنیت اطلاعات کنم؟
    برخی از بهترین مؤسسات مربیگری GMAT در دهلی/NCR چیست؟ بهترین مؤسسات مربیگری برای GMAT در NCR چیست؟
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • learning_rate: 3e-05
  • weight_decay: 0.15
  • num_train_epochs: 10
  • warmup_ratio: 0.15
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.15
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.15
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0429 100 0.1219
0.0858 200 0.0626
0.1288 300 0.0489
0.1717 400 0.0414
0.2146 500 0.0432
0.2575 600 0.0419
0.3004 700 0.0313
0.3433 800 0.0339
0.3863 900 0.0317
0.4292 1000 0.035
0.4721 1100 0.0378
0.5150 1200 0.0308
0.5579 1300 0.0305
0.6009 1400 0.0312
0.6438 1500 0.0304
0.6867 1600 0.0295
0.7296 1700 0.0301
0.7725 1800 0.033
0.8155 1900 0.0263
0.8584 2000 0.0276
0.9013 2100 0.0236
0.9442 2200 0.0276
0.9871 2300 0.0278
1.0300 2400 0.0309
1.0730 2500 0.0269
1.1159 2600 0.0299
1.1588 2700 0.0272
1.2017 2800 0.029
1.2446 2900 0.0309
1.2876 3000 0.0247
1.3305 3100 0.0244
1.3734 3200 0.0261
1.4163 3300 0.0254
1.4592 3400 0.0273
1.5021 3500 0.0298
1.5451 3600 0.0225
1.5880 3700 0.0278
1.6309 3800 0.027
1.6738 3900 0.0218
1.7167 4000 0.0247
1.7597 4100 0.023
1.8026 4200 0.0225
1.8455 4300 0.0191
1.8884 4400 0.0174
1.9313 4500 0.0214
1.9742 4600 0.018
2.0172 4700 0.0227
2.0601 4800 0.0222
2.1030 4900 0.0211
2.1459 5000 0.0204
2.1888 5100 0.0215
2.2318 5200 0.0206
2.2747 5300 0.0213
2.3176 5400 0.0168
2.3605 5500 0.0189
2.4034 5600 0.0206
2.4464 5700 0.0194
2.4893 5800 0.0182
2.5322 5900 0.017
2.5751 6000 0.0186
2.6180 6100 0.017
2.6609 6200 0.0152
2.7039 6300 0.0164
2.7468 6400 0.0142
2.7897 6500 0.0162
2.8326 6600 0.0123
2.8755 6700 0.0162
2.9185 6800 0.0138
2.9614 6900 0.0163
3.0043 7000 0.0138
3.0472 7100 0.0164
3.0901 7200 0.016
3.1330 7300 0.0175
3.1760 7400 0.0143
3.2189 7500 0.0142
3.2618 7600 0.0176
3.3047 7700 0.0147
3.3476 7800 0.0164
3.3906 7900 0.0133
3.4335 8000 0.0168
3.4764 8100 0.0166
3.5193 8200 0.0138
3.5622 8300 0.0126
3.6052 8400 0.0145
3.6481 8500 0.0114
3.6910 8600 0.0137
3.7339 8700 0.014
3.7768 8800 0.0134
3.8197 8900 0.0108
3.8627 9000 0.012
3.9056 9100 0.0102
3.9485 9200 0.0119
3.9914 9300 0.0122
4.0343 9400 0.0116
4.0773 9500 0.0136
4.1202 9600 0.0135
4.1631 9700 0.0108
4.2060 9800 0.0119
4.2489 9900 0.0142
4.2918 10000 0.0111
4.3348 10100 0.0131
4.3777 10200 0.0103
4.4206 10300 0.0124
4.4635 10400 0.0163
4.5064 10500 0.0123
4.5494 10600 0.0112
4.5923 10700 0.01
4.6352 10800 0.0096
4.6781 10900 0.0103
4.7210 11000 0.0102
4.7639 11100 0.0092
4.8069 11200 0.0107
4.8498 11300 0.0114
4.8927 11400 0.0091
4.9356 11500 0.0108
4.9785 11600 0.0092
5.0215 11700 0.0086
5.0644 11800 0.0104
5.1073 11900 0.0123
5.1502 12000 0.009
5.1931 12100 0.0106
5.2361 12200 0.0114
5.2790 12300 0.0098
5.3219 12400 0.0093
5.3648 12500 0.0092
5.4077 12600 0.011
5.4506 12700 0.0113
5.4936 12800 0.0091
5.5365 12900 0.0079
5.5794 13000 0.01
5.6223 13100 0.0067
5.6652 13200 0.0081
5.7082 13300 0.0097
5.7511 13400 0.0081
5.7940 13500 0.0094
5.8369 13600 0.0074
5.8798 13700 0.0071
5.9227 13800 0.0074
5.9657 13900 0.0076
6.0086 14000 0.0063
6.0515 14100 0.0083
6.0944 14200 0.0101
6.1373 14300 0.0084
6.1803 14400 0.0074
6.2232 14500 0.007
6.2661 14600 0.0078
6.3090 14700 0.0074
6.3519 14800 0.0086
6.3948 14900 0.0069
6.4378 15000 0.0083
6.4807 15100 0.0082
6.5236 15200 0.0066
6.5665 15300 0.0086
6.6094 15400 0.0059
6.6524 15500 0.0052
6.6953 15600 0.0081
6.7382 15700 0.0054
6.7811 15800 0.0063
6.8240 15900 0.0065
6.8670 16000 0.0068
6.9099 16100 0.0047
6.9528 16200 0.0065
6.9957 16300 0.0064
7.0386 16400 0.0051
7.0815 16500 0.0066
7.1245 16600 0.0069
7.1674 16700 0.0074
7.2103 16800 0.0062
7.2532 16900 0.0071
7.2961 17000 0.005
7.3391 17100 0.008
7.3820 17200 0.0047
7.4249 17300 0.0073
7.4678 17400 0.0078
7.5107 17500 0.0058
7.5536 17600 0.0055
7.5966 17700 0.0049
7.6395 17800 0.0046
7.6824 17900 0.0051
7.7253 18000 0.005
7.7682 18100 0.0059
7.8112 18200 0.0056
7.8541 18300 0.0049
7.8970 18400 0.0038
7.9399 18500 0.005
7.9828 18600 0.005
8.0258 18700 0.0036
8.0687 18800 0.0049
8.1116 18900 0.0067
8.1545 19000 0.0056
8.1974 19100 0.0061
8.2403 19200 0.0054
8.2833 19300 0.0046
8.3262 19400 0.0048
8.3691 19500 0.0052
8.4120 19600 0.0059
8.4549 19700 0.0053
8.4979 19800 0.0049
8.5408 19900 0.0036
8.5837 20000 0.0049
8.6266 20100 0.0033
8.6695 20200 0.0049
8.7124 20300 0.0043
8.7554 20400 0.0039
8.7983 20500 0.0038
8.8412 20600 0.0035
8.8841 20700 0.0041
8.9270 20800 0.0042
8.9700 20900 0.0056
9.0129 21000 0.0031
9.0558 21100 0.004
9.0987 21200 0.0043
9.1416 21300 0.0047
9.1845 21400 0.0051
9.2275 21500 0.0032
9.2704 21600 0.0045
9.3133 21700 0.0038
9.3562 21800 0.0045
9.3991 21900 0.0047
9.4421 22000 0.0048
9.4850 22100 0.0042
9.5279 22200 0.0039
9.5708 22300 0.0042
9.6137 22400 0.003
9.6567 22500 0.0031
9.6996 22600 0.0042
9.7425 22700 0.0028
9.7854 22800 0.0037
9.8283 22900 0.0035
9.8712 23000 0.0033
9.9142 23100 0.0029
9.9571 23200 0.0048
10.0 23300 0.0039

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
471M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for codersan/FaLaBSE-v6

Finetuned
(36)
this model