SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-ml-feb11")
# Run inference
sentences = [
    'Mathlib.LinearAlgebra.CliffordAlgebra.Basic#46',
    'CliffordAlgebra.mul_add_swap_eq_polar_of_forall_mul_self_eq',
    'Finset.sum_congr',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 4,293,921 training samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 11 tokens
    • mean: 16.87 tokens
    • max: 28 tokens
    • min: 3 tokens
    • mean: 10.27 tokens
    • max: 27 tokens
  • Samples:
    state_name premise_name
    Mathlib.Algebra.Group.Subgroup.Pointwise#27 Set.mul_subgroupClosure
    Mathlib.Algebra.Group.Subgroup.Pointwise#27 pow_succ
    Mathlib.Algebra.Group.Subgroup.Pointwise#27 mul_assoc
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,676 evaluation samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 12 tokens
    • mean: 17.35 tokens
    • max: 26 tokens
    • min: 3 tokens
    • mean: 10.87 tokens
    • max: 34 tokens
  • Samples:
    state_name premise_name
    Mathlib.Algebra.BigOperators.Associated#0 Prime.dvd_or_dvd
    Mathlib.Algebra.BigOperators.Associated#0 Multiset.induction_on
    Mathlib.Algebra.BigOperators.Associated#0 Multiset.mem_cons_of_mem
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0002
  • num_train_epochs: 1.0
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.03
  • bf16: True
  • dataloader_num_workers: 4
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0002
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1.0
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.03
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0024 10 6.3113 -
0.0048 20 5.768 -
0.0072 30 5.4084 -
0.0095 40 5.1243 -
0.0100 42 - 1.5848
0.0119 50 4.996 -
0.0143 60 4.9292 -
0.0167 70 4.7929 -
0.0191 80 4.7368 -
0.0200 84 - 1.4126
0.0215 90 4.6902 -
0.0238 100 4.6073 -
0.0262 110 4.5754 -
0.0286 120 4.5621 -
0.0300 126 - 1.2768
0.0310 130 4.5085 -
0.0334 140 4.4216 -
0.0358 150 4.4089 -
0.0381 160 4.3785 -
0.0401 168 - 1.2377
0.0405 170 4.3003 -
0.0429 180 4.272 -
0.0453 190 4.2197 -
0.0477 200 4.189 -
0.0501 210 4.1967 1.1451
0.0525 220 4.1612 -
0.0548 230 4.1096 -
0.0572 240 4.0698 -
0.0596 250 4.0484 -
0.0601 252 - 1.1022
0.0620 260 4.0192 -
0.0644 270 4.0159 -
0.0668 280 4.0188 -
0.0691 290 3.9599 -
0.0701 294 - 1.0653
0.0715 300 3.9634 -
0.0739 310 3.9027 -
0.0763 320 3.8404 -
0.0787 330 3.9112 -
0.0801 336 - 1.0256
0.0811 340 3.8831 -
0.0835 350 3.8834 -
0.0858 360 3.8773 -
0.0882 370 3.8435 -
0.0901 378 - 1.0854
0.0906 380 3.855 -
0.0930 390 3.8484 -
0.0954 400 3.7728 -
0.0978 410 3.6967 -
0.1001 420 3.778 1.0974
0.1025 430 3.7449 -
0.1049 440 3.7032 -
0.1073 450 3.7373 -
0.1097 460 3.6996 -
0.1102 462 - 1.0316
0.1121 470 3.6852 -
0.1144 480 3.609 -
0.1168 490 3.5836 -
0.1192 500 3.6087 -
0.1202 504 - 1.0098
0.1216 510 3.5539 -
0.1240 520 3.5611 -
0.1264 530 3.6365 -
0.1288 540 3.5787 -
0.1302 546 - 0.9769
0.1311 550 3.5795 -
0.1335 560 3.5283 -
0.1359 570 3.546 -
0.1383 580 3.4739 -
0.1402 588 - 1.0362
0.1407 590 3.5161 -
0.1431 600 3.4315 -
0.1454 610 3.446 -
0.1478 620 3.4618 -
0.1502 630 3.4212 0.9364
0.1526 640 3.4464 -
0.1550 650 3.46 -
0.1574 660 3.3695 -
0.1598 670 3.356 -
0.1602 672 - 0.9324
0.1621 680 3.2896 -
0.1645 690 3.3295 -
0.1669 700 3.3305 -
0.1693 710 3.36 -
0.1702 714 - 0.9268
0.1717 720 3.3037 -
0.1741 730 3.3374 -
0.1764 740 3.3523 -
0.1788 750 3.3123 -
0.1803 756 - 0.8850
0.1812 760 3.2635 -
0.1836 770 3.2558 -
0.1860 780 3.2126 -
0.1884 790 3.2516 -
0.1903 798 - 0.9161
0.1907 800 3.2121 -
0.1931 810 3.2356 -
0.1955 820 3.2765 -
0.1979 830 3.1934 -
0.2003 840 3.1938 0.8648
0.2027 850 3.2396 -
0.2051 860 3.1654 -
0.2074 870 3.1056 -
0.2098 880 3.1096 -
0.2103 882 - 0.8460
0.2122 890 3.1613 -
0.2146 900 3.1922 -
0.2170 910 3.0955 -
0.2194 920 3.0681 -
0.2203 924 - 0.8319
0.2217 930 3.1376 -
0.2241 940 3.148 -
0.2265 950 3.1331 -
0.2289 960 3.076 -
0.2303 966 - 0.8071
0.2313 970 3.1274 -
0.2337 980 3.0901 -
0.2361 990 3.0651 -
0.2384 1000 3.024 -
0.2403 1008 - 0.8220
0.2408 1010 3.0311 -
0.2432 1020 3.0188 -
0.2456 1030 2.9341 -
0.2480 1040 2.9745 -
0.2504 1050 3.0033 0.8258
0.2527 1060 3.0175 -
0.2551 1070 2.9599 -
0.2575 1080 2.9868 -
0.2599 1090 2.915 -
0.2604 1092 - 0.7990
0.2623 1100 2.9195 -
0.2647 1110 2.9732 -
0.2670 1120 2.9822 -
0.2694 1130 2.9388 -
0.2704 1134 - 0.8316
0.2718 1140 2.929 -
0.2742 1150 2.9218 -
0.2766 1160 2.8534 -
0.2790 1170 2.885 -
0.2804 1176 - 0.8339
0.2814 1180 2.9252 -
0.2837 1190 2.8983 -
0.2861 1200 2.8483 -
0.2885 1210 2.8533 -
0.2904 1218 - 0.7831
0.2909 1220 2.8155 -
0.2933 1230 2.8068 -
0.2957 1240 2.7685 -
0.2980 1250 2.772 -
0.3004 1260 2.7242 0.7851
0.3028 1270 2.7578 -
0.3052 1280 2.779 -
0.3076 1290 2.7835 -
0.3100 1300 2.7999 -
0.3104 1302 - 0.7854
0.3124 1310 2.8235 -
0.3147 1320 2.7455 -
0.3171 1330 2.745 -
0.3195 1340 2.7275 -
0.3205 1344 - 0.7646
0.3219 1350 2.7866 -
0.3243 1360 2.8072 -
0.3267 1370 2.7537 -
0.3290 1380 2.7328 -
0.3305 1386 - 0.7548
0.3314 1390 2.7642 -
0.3338 1400 2.7285 -
0.3362 1410 2.7388 -
0.3386 1420 2.7056 -
0.3405 1428 - 0.7031
0.3410 1430 2.6704 -
0.3433 1440 2.6718 -
0.3457 1450 2.6517 -
0.3481 1460 2.6788 -
0.3505 1470 2.6815 0.7608
0.3529 1480 2.6683 -
0.3553 1490 2.6534 -
0.3577 1500 2.6676 -
0.3600 1510 2.6695 -
0.3605 1512 - 0.7476
0.3624 1520 2.6648 -
0.3648 1530 2.5935 -
0.3672 1540 2.6464 -
0.3696 1550 2.621 -
0.3705 1554 - 0.7356
0.3720 1560 2.5994 -
0.3743 1570 2.6171 -
0.3767 1580 2.5903 -
0.3791 1590 2.62 -
0.3805 1596 - 0.7192
0.3815 1600 2.6257 -
0.3839 1610 2.656 -
0.3863 1620 2.6549 -
0.3887 1630 2.6522 -
0.3906 1638 - 0.7101
0.3910 1640 2.6236 -
0.3934 1650 2.5769 -
0.3958 1660 2.6071 -
0.3982 1670 2.6663 -
0.4006 1680 2.6382 0.7083
0.4030 1690 2.6081 -
0.4053 1700 2.6092 -
0.4077 1710 2.5602 -
0.4101 1720 2.58 -
0.4106 1722 - 0.7361
0.4125 1730 2.5266 -
0.4149 1740 2.4992 -
0.4173 1750 2.5094 -
0.4196 1760 2.5468 -
0.4206 1764 - 0.6964
0.4220 1770 2.5543 -
0.4244 1780 2.538 -
0.4268 1790 2.5094 -
0.4292 1800 2.5583 -
0.4306 1806 - 0.6982
0.4316 1810 2.5423 -
0.4340 1820 2.4879 -
0.4363 1830 2.4811 -
0.4387 1840 2.4741 -
0.4406 1848 - 0.6840
0.4411 1850 2.469 -
0.4435 1860 2.4565 -
0.4459 1870 2.4599 -
0.4483 1880 2.4294 -
0.4506 1890 2.4434 0.6697
0.4530 1900 2.3968 -
0.4554 1910 2.4614 -
0.4578 1920 2.4615 -
0.4602 1930 2.4527 -
0.4607 1932 - 0.6599
0.4626 1940 2.4239 -
0.4649 1950 2.4222 -
0.4673 1960 2.4432 -
0.4697 1970 2.4589 -
0.4707 1974 - 0.6694
0.4721 1980 2.4381 -
0.4745 1990 2.4959 -
0.4769 2000 2.4146 -
0.4793 2010 2.3884 -
0.4807 2016 - 0.6662
0.4816 2020 2.4217 -
0.4840 2030 2.3768 -
0.4864 2040 2.3574 -
0.4888 2050 2.3983 -
0.4907 2058 - 0.6654
0.4912 2060 2.3659 -
0.4936 2070 2.3771 -
0.4959 2080 2.3523 -
0.4983 2090 2.4098 -
0.5007 2100 2.3258 0.6297
0.5031 2110 2.3491 -
0.5055 2120 2.3685 -
0.5079 2130 2.365 -
0.5103 2140 2.4 -
0.5107 2142 - 0.6537
0.5126 2150 2.3405 -
0.5150 2160 2.3431 -
0.5174 2170 2.3571 -
0.5198 2180 2.3688 -
0.5207 2184 - 0.6372
0.5222 2190 2.3629 -
0.5246 2200 2.3465 -
0.5269 2210 2.3065 -
0.5293 2220 2.3649 -
0.5308 2226 - 0.6653
0.5317 2230 2.33 -
0.5341 2240 2.2455 -
0.5365 2250 2.2934 -
0.5389 2260 2.3046 -
0.5408 2268 - 0.6560
0.5412 2270 2.3153 -
0.5436 2280 2.3437 -
0.5460 2290 2.2914 -
0.5484 2300 2.2686 -
0.5508 2310 2.2969 0.6233
0.5532 2320 2.2805 -
0.5556 2330 2.3017 -
0.5579 2340 2.2962 -
0.5603 2350 2.2852 -
0.5608 2352 - 0.6208
0.5627 2360 2.3113 -
0.5651 2370 2.3037 -
0.5675 2380 2.3447 -
0.5699 2390 2.3034 -
0.5708 2394 - 0.6143
0.5722 2400 2.2819 -
0.5746 2410 2.2569 -
0.5770 2420 2.2636 -
0.5794 2430 2.2684 -
0.5808 2436 - 0.6032
0.5818 2440 2.2681 -
0.5842 2450 2.3051 -
0.5866 2460 2.2416 -
0.5889 2470 2.2342 -
0.5908 2478 - 0.6192
0.5913 2480 2.2278 -
0.5937 2490 2.2091 -
0.5961 2500 2.1972 -
0.5985 2510 2.1992 -
0.6009 2520 2.2336 0.6036
0.6032 2530 2.2052 -
0.6056 2540 2.2228 -
0.6080 2550 2.1988 -
0.6104 2560 2.202 -
0.6109 2562 - 0.5945
0.6128 2570 2.2292 -
0.6152 2580 2.2265 -
0.6175 2590 2.2222 -
0.6199 2600 2.1563 -
0.6209 2604 - 0.6000
0.6223 2610 2.1737 -
0.6247 2620 2.1518 -
0.6271 2630 2.1243 -
0.6295 2640 2.1266 -
0.6309 2646 - 0.5961
0.6319 2650 2.1924 -
0.6342 2660 2.1339 -
0.6366 2670 2.164 -
0.6390 2680 2.1004 -
0.6409 2688 - 0.6034
0.6414 2690 2.1539 -
0.6438 2700 2.1828 -
0.6462 2710 2.1851 -
0.6485 2720 2.1562 -
0.6509 2730 2.1097 0.5960
0.6533 2740 2.1338 -
0.6557 2750 2.1412 -
0.6581 2760 2.1905 -
0.6605 2770 2.1343 -
0.6609 2772 - 0.5963
0.6629 2780 2.1284 -
0.6652 2790 2.1625 -
0.6676 2800 2.1351 -
0.6700 2810 2.1547 -
0.6710 2814 - 0.5953
0.6724 2820 2.1367 -
0.6748 2830 2.1357 -
0.6772 2840 2.1318 -
0.6795 2850 2.1338 -
0.6810 2856 - 0.5862
0.6819 2860 2.1701 -
0.6843 2870 2.1554 -
0.6867 2880 2.1469 -
0.6891 2890 2.1085 -
0.6910 2898 - 0.5730
0.6915 2900 2.1068 -
0.6938 2910 2.1066 -
0.6962 2920 2.0814 -
0.6986 2930 2.1041 -
0.7010 2940 2.125 0.5761
0.7034 2950 2.0887 -
0.7058 2960 2.0908 -
0.7082 2970 2.119 -
0.7105 2980 2.1203 -
0.7110 2982 - 0.5758
0.7129 2990 2.1332 -
0.7153 3000 2.0936 -
0.7177 3010 2.0998 -
0.7201 3020 2.1111 -
0.7210 3024 - 0.5645
0.7225 3030 2.1444 -
0.7248 3040 2.1081 -
0.7272 3050 2.0555 -
0.7296 3060 2.0905 -
0.7310 3066 - 0.5695
0.7320 3070 2.1654 -
0.7344 3080 2.1358 -
0.7368 3090 2.1853 -
0.7392 3100 2.1544 -
0.7411 3108 - 0.5537
0.7415 3110 2.1343 -
0.7439 3120 2.1485 -
0.7463 3130 2.1189 -
0.7487 3140 2.1046 -
0.7511 3150 2.1016 0.5493
0.7535 3160 2.1202 -
0.7558 3170 2.0679 -
0.7582 3180 2.0589 -
0.7606 3190 2.045 -
0.7611 3192 - 0.5517
0.7630 3200 2.0389 -
0.7654 3210 2.004 -
0.7678 3220 2.0712 -
0.7701 3230 2.1005 -
0.7711 3234 - 0.5508
0.7725 3240 2.0962 -
0.7749 3250 2.0793 -
0.7773 3260 2.0686 -
0.7797 3270 2.0576 -
0.7811 3276 - 0.5472
0.7821 3280 2.0571 -
0.7845 3290 2.0455 -
0.7868 3300 2.0349 -
0.7892 3310 2.0565 -
0.7911 3318 - 0.5465
0.7916 3320 2.0392 -
0.7940 3330 2.0245 -
0.7964 3340 2.0249 -
0.7988 3350 2.0381 -
0.8011 3360 2.0244 0.5442
0.8035 3370 2.1085 -
0.8059 3380 2.0464 -
0.8083 3390 2.047 -
0.8107 3400 2.0011 -
0.8112 3402 - 0.5298
0.8131 3410 2.0052 -
0.8155 3420 2.0278 -
0.8178 3430 1.9971 -
0.8202 3440 1.9969 -
0.8212 3444 - 0.5359
0.8226 3450 2.0504 -
0.8250 3460 2.0561 -
0.8274 3470 2.036 -
0.8298 3480 2.0541 -
0.8312 3486 - 0.5335
0.8321 3490 2.0495 -
0.8345 3500 2.0559 -
0.8369 3510 2.0592 -
0.8393 3520 2.039 -
0.8412 3528 - 0.5326
0.8417 3530 2.0175 -
0.8441 3540 1.9443 -
0.8464 3550 2.0359 -
0.8488 3560 2.0465 -
0.8512 3570 1.9831 0.5339
0.8536 3580 2.0071 -
0.8560 3590 1.9969 -
0.8584 3600 2.0037 -
0.8608 3610 2.0534 -
0.8612 3612 - 0.5324
0.8631 3620 2.03 -
0.8655 3630 1.9772 -
0.8679 3640 2.0403 -
0.8703 3650 2.0577 -
0.8712 3654 - 0.5293
0.8727 3660 1.988 -
0.8751 3670 2.0217 -
0.8774 3680 1.9962 -
0.8798 3690 1.6997 -
0.8813 3696 - 0.5169
0.8822 3700 1.4935 -
0.8846 3710 1.565 -
0.8870 3720 1.6474 -
0.8894 3730 1.8094 -
0.8913 3738 - 0.5628
0.8918 3740 1.8653 -
0.8941 3750 1.9533 -
0.8965 3760 2.0212 -
0.8989 3770 1.9538 -
0.9013 3780 2.0019 0.6071
0.9037 3790 1.9752 -
0.9061 3800 2.0486 -
0.9084 3810 1.9822 -
0.9108 3820 1.994 -
0.9113 3822 - 0.6515
0.9132 3830 1.975 -
0.9156 3840 1.9651 -
0.9180 3850 2.0306 -
0.9204 3860 1.9781 -
0.9213 3864 - 0.6870
0.9227 3870 2.0189 -
0.9251 3880 2.0161 -
0.9275 3890 1.983 -
0.9299 3900 1.9762 -
0.9313 3906 - 0.6943
0.9323 3910 1.9491 -
0.9347 3920 1.8848 -
0.9371 3930 1.9636 -
0.9394 3940 1.9414 -
0.9413 3948 - 0.7033
0.9418 3950 2.0063 -
0.9442 3960 2.0022 -
0.9466 3970 1.9804 -
0.9490 3980 2.0275 -
0.9514 3990 1.8817 0.7150
0.9537 4000 1.8996 -
0.9561 4010 1.9265 -
0.9585 4020 1.914 -
0.9609 4030 1.924 -
0.9614 4032 - 0.7249
0.9633 4040 1.8393 -
0.9657 4050 1.9934 -
0.9680 4060 1.9588 -
0.9704 4070 1.9951 -
0.9714 4074 - 0.7300
0.9728 4080 1.9641 -
0.9752 4090 1.9337 -
0.9776 4100 1.8943 -
0.9800 4110 1.9441 -
0.9814 4116 - 0.7319
0.9824 4120 1.9226 -
0.9847 4130 1.9444 -
0.9871 4140 1.9695 -
0.9895 4150 1.9809 -
0.9914 4158 - 0.7320
0.9919 4160 1.9574 -
0.9943 4170 1.9633 -
0.9967 4180 1.9237 -
0.9990 4190 1.9115 -

Framework Versions

  • Python: 3.11.8
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.1
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MaskedCachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
16
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-ml-feb11

Finetuned
(14)
this model