--- language: - en license: apache-2.0 tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:311351 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: BAAI/bge-base-en-v1.5 widget: - source_sentence: How much non-cash impairment losses were recognized on theaters in international markets in 2022? sentences: - The timing and amounts of deductible and taxable items and the probability of sustaining uncertain tax positions requires significant judgment. The benefits of uncertain tax positions are recorded in the Company’s consolidated financial statements only after determining a more-likely-than-not probability that the uncertain tax positions... - During the year ended December 31, 2022, non-cash impairment losses of $59.7 million were recognized on 53 theaters in the International markets which were related to property, net, and operating lease right-of-use assets, net. - Under Item 8 Financial Statements and Supplementary Data is discussed which includes a variety of financial reporting. - source_sentence: What factors led to the increase in Intelligent Edge earnings from operations as a percentage of net revenue? sentences: - Intelligent Edge earnings from operations as a percentage of net revenue increased 12.4 percentage points primarily due to decreases in cost of products and services as a percentage of net revenue and operating expenses as a percentage of net revenue. - Diverse and Inclusive Workplace We work to build a diverse and inclusive workplace where we can leverage our collective cognitive diversity to build the best products and make the best decisions for the global community we serve. We want our products to work for people around the world and we need to grow and keep the best talent in order to do that. - In March 2023, the Board of Directors sanctioned a restructuring plan concentrated on investment prioritization towards significant growth prospects and the optimization of the company's real estate assets. This includes substantial organizational changes such as reductions in office space and workforce. - source_sentence: What type of financial information is provided in Part IV, Item 15(a)(1) of the Annual Report on Form 10-K? sentences: - Our self-insurance reserve estimates totaled $268.8 million at August 26, 2023. - The consolidated financial statements and accompanying notes listed in Part IV, Item 15(a)(1) of the Annual Report on Form 10-K. - Constant dollar changes and adjusted financial results are non-GAAP financial measures. A constant dollar basis assumes the average foreign currency exchange rates for the period remained constant with the average foreign currency exchange rates for the same period of the prior year. We provide constant dollar changes in our results to help investors understand the underlying growth rate of net revenue excluding the impact of changes in foreign currency exchange rates. - source_sentence: What constitutes the largest expense in the company's various expense categories? sentences: - Personnel-related costs are the most significant component of the company's operating expenses such as research and development, sales and marketing, and general and administrative expenses, excluding restructuring and asset impairment charges. - As of October 31, 2022, the aggregate projected benefit obligations for U.S. Defined Benefit Plans were $289 million and for Non-U.S Defined Benefit Plans it was $996 million. As of October 31, 2023, these obligations were $267 million for U.S. Defined Benefit Plans and $1,052 million for Non-U.S. Defined Benefit Plans. - The Management’s Discussion and Analysis section discusses the company's financial condition and results of operations, suggesting it be read alongside the consolidated financial statements included in the Annual Report on Form 10-K. - source_sentence: What was the total premiums revenue for the Insurance segment in 2023? sentences: - Cash, cash equivalents and restricted cash at end of period is reported to be $6,985. - Insurance segment premiums revenue increased $13.6 billion, or 15.5%, from $87.7 billion in the 2022 period to $101.3 billion in the 2023 period. - On a quarterly basis, we employ a consistent, systematic and rational methodology to assess the adequacy of our warranty liability. pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: Vignesh finetuned bge results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.6385714285714286 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8057142857142857 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8514285714285714 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.8885714285714286 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.6385714285714286 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.26857142857142857 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.17028571428571426 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.08885714285714284 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.6385714285714286 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8057142857142857 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8514285714285714 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.8885714285714286 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.7672875738418359 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.7279155328798183 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7324235298064157 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.6428571428571429 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.7957142857142857 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8428571428571429 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.8785714285714286 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.6428571428571429 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2652380952380952 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.16857142857142857 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.08785714285714284 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.6428571428571429 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.7957142857142857 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8428571428571429 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.8785714285714286 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.7649011857503378 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.7278752834467118 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7330044690874636 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy@1 value: 0.6414285714285715 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.84 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.8814285714285715 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.6414285714285715 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.26666666666666666 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.16799999999999998 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.08814285714285712 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.6414285714285715 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.84 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.8814285714285715 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.7649914708405767 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.7272885487528342 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7320436030547072 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.6171428571428571 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.7785714285714286 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8242857142857143 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.8728571428571429 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.6171428571428571 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2595238095238095 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1648571428571428 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.08728571428571427 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.6171428571428571 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.7785714285714286 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8242857142857143 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.8728571428571429 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.7477873461127673 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.7075640589569155 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7124046732307174 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.59 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.7485714285714286 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.7942857142857143 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.8671428571428571 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.59 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2495238095238095 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.15885714285714284 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.0867142857142857 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.59 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.7485714285714286 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.7942857142857143 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.8671428571428571 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.7258107978054003 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.6810629251700676 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.685615559071417 name: Cosine Map@100 --- # Vignesh finetuned bge This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity - **Training Dataset:** - json - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("viggypoker1/Vignesh-finetuned-bge") # Run inference sentences = [ 'What was the total premiums revenue for the Insurance segment in 2023?', 'Insurance segment premiums revenue increased $13.6 billion, or 15.5%, from $87.7 billion in the 2022 period to $101.3 billion in the 2023 period.', 'On a quarterly basis, we employ a consistent, systematic and rational methodology to assess the adequacy of our warranty liability.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `dim_768` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.6386 | | cosine_accuracy@3 | 0.8057 | | cosine_accuracy@5 | 0.8514 | | cosine_accuracy@10 | 0.8886 | | cosine_precision@1 | 0.6386 | | cosine_precision@3 | 0.2686 | | cosine_precision@5 | 0.1703 | | cosine_precision@10 | 0.0889 | | cosine_recall@1 | 0.6386 | | cosine_recall@3 | 0.8057 | | cosine_recall@5 | 0.8514 | | cosine_recall@10 | 0.8886 | | cosine_ndcg@10 | 0.7673 | | cosine_mrr@10 | 0.7279 | | **cosine_map@100** | **0.7324** | #### Information Retrieval * Dataset: `dim_512` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:----------| | cosine_accuracy@1 | 0.6429 | | cosine_accuracy@3 | 0.7957 | | cosine_accuracy@5 | 0.8429 | | cosine_accuracy@10 | 0.8786 | | cosine_precision@1 | 0.6429 | | cosine_precision@3 | 0.2652 | | cosine_precision@5 | 0.1686 | | cosine_precision@10 | 0.0879 | | cosine_recall@1 | 0.6429 | | cosine_recall@3 | 0.7957 | | cosine_recall@5 | 0.8429 | | cosine_recall@10 | 0.8786 | | cosine_ndcg@10 | 0.7649 | | cosine_mrr@10 | 0.7279 | | **cosine_map@100** | **0.733** | #### Information Retrieval * Dataset: `dim_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:----------| | cosine_accuracy@1 | 0.6414 | | cosine_accuracy@3 | 0.8 | | cosine_accuracy@5 | 0.84 | | cosine_accuracy@10 | 0.8814 | | cosine_precision@1 | 0.6414 | | cosine_precision@3 | 0.2667 | | cosine_precision@5 | 0.168 | | cosine_precision@10 | 0.0881 | | cosine_recall@1 | 0.6414 | | cosine_recall@3 | 0.8 | | cosine_recall@5 | 0.84 | | cosine_recall@10 | 0.8814 | | cosine_ndcg@10 | 0.765 | | cosine_mrr@10 | 0.7273 | | **cosine_map@100** | **0.732** | #### Information Retrieval * Dataset: `dim_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.6171 | | cosine_accuracy@3 | 0.7786 | | cosine_accuracy@5 | 0.8243 | | cosine_accuracy@10 | 0.8729 | | cosine_precision@1 | 0.6171 | | cosine_precision@3 | 0.2595 | | cosine_precision@5 | 0.1649 | | cosine_precision@10 | 0.0873 | | cosine_recall@1 | 0.6171 | | cosine_recall@3 | 0.7786 | | cosine_recall@5 | 0.8243 | | cosine_recall@10 | 0.8729 | | cosine_ndcg@10 | 0.7478 | | cosine_mrr@10 | 0.7076 | | **cosine_map@100** | **0.7124** | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.59 | | cosine_accuracy@3 | 0.7486 | | cosine_accuracy@5 | 0.7943 | | cosine_accuracy@10 | 0.8671 | | cosine_precision@1 | 0.59 | | cosine_precision@3 | 0.2495 | | cosine_precision@5 | 0.1589 | | cosine_precision@10 | 0.0867 | | cosine_recall@1 | 0.59 | | cosine_recall@3 | 0.7486 | | cosine_recall@5 | 0.7943 | | cosine_recall@10 | 0.8671 | | cosine_ndcg@10 | 0.7258 | | cosine_mrr@10 | 0.6811 | | **cosine_map@100** | **0.6856** | ## Training Details ### Training Dataset #### json * Dataset: json * Size: 311,351 training samples * Columns: anchor and positive * Approximate statistics based on the first 1000 samples: | | anchor | positive | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What percentage of net revenues came from Mutual Funds, ETFs, and Collective Trust Funds (CTFs) in 2023? | Mutual Funds, ETFs, and Collective Trust Funds (CTFs) contributed 13% to the net revenues in 2023. | | What was the amount of additional stock-based compensation expense recognized due to the Type 3 modification in the year ended December 31, 2023? | A special award grant on February 23, 2023, resulted in a Type 3 modification of the 2022 PSU awards, leading to an additional stock-based compensation expense of $20.2 million recognized in that year. | | What was the percentage point decrease in earnings from operations as a percentage of net revenue for the Printing segment in the fiscal year 2023? | Printing earnings from operations as a percentage of net revenue decreased by 0.2 percentage points in the fiscal year 2023. | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Evaluation Dataset #### json * Dataset: json * Size: 700 evaluation samples * Columns: anchor and positive * Approximate statistics based on the first 700 samples: | | anchor | positive | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:----------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | How does GameStop optimize the efficiency of its product distribution? | We use our distribution facilities, store locations and inventory management systems to optimize the efficiency of the flow of products to our stores and customers, enhance fulfillment efficiency and optimize in-stock and overall investment in inventory. | | What was the net production increase percentage of Chevron's worldwide oil-equivalent from 2022 to 2023? | For the year 2023, Chevron's worldwide oil-equivalent production was 3.1 million barrels per day, marking an increase of about 4 percent from the 2022 level. | | How has Tesla sought to increase the affordability of their vehicles in international markets? | Internationally, we also have manufacturing facilities in China (Gigafactory Shanghai) and Germany (Gigafactory Berlin-Brandenburg), which allows us to increase the affordability of our vehicles for customers in local markets by reducing transportation and manufacturing costs and eliminating the impact of unfavorable tariffs. | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 128 - `per_device_eval_batch_size`: 16 - `gradient_accumulation_steps`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 4 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `fp16`: True - `tf32`: False - `load_best_model_at_end`: True - `optim`: adamw_torch_fused - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 128 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 16 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 4 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: False - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 | |:----------:|:-------:|:-------------:|:----------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:| | 0.0658 | 10 | 12.7378 | - | - | - | - | - | - | | 0.1315 | 20 | 16.125 | - | - | - | - | - | - | | 0.1973 | 30 | 19.5213 | - | - | - | - | - | - | | 0.2630 | 40 | 21.3366 | - | - | - | - | - | - | | 0.3288 | 50 | 18.9311 | - | - | - | - | - | - | | 0.3946 | 60 | 5.5988 | - | - | - | - | - | - | | 0.4603 | 70 | 2.9878 | - | - | - | - | - | - | | 0.5261 | 80 | 2.0073 | - | - | - | - | - | - | | 0.5919 | 90 | 1.5752 | - | - | - | - | - | - | | 0.6576 | 100 | 1.3491 | - | - | - | - | - | - | | 0.7234 | 110 | 1.1473 | - | - | - | - | - | - | | 0.7891 | 120 | 1.0644 | - | - | - | - | - | - | | 0.8549 | 130 | 0.9987 | - | - | - | - | - | - | | 0.9207 | 140 | 0.8948 | - | - | - | - | - | - | | 0.9864 | 150 | 0.877 | - | - | - | - | - | - | | 0.9996 | 152 | - | 0.3206 | 0.6646 | 0.6955 | 0.7089 | 0.6391 | 0.7145 | | 1.0522 | 160 | 7.7524 | - | - | - | - | - | - | | 1.1180 | 170 | 12.5198 | - | - | - | - | - | - | | 1.1837 | 180 | 16.8236 | - | - | - | - | - | - | | 1.2495 | 190 | 18.7345 | - | - | - | - | - | - | | 1.3152 | 200 | 18.986 | - | - | - | - | - | - | | 1.3810 | 210 | 5.3162 | - | - | - | - | - | - | | 1.4468 | 220 | 1.1987 | - | - | - | - | - | - | | 1.5125 | 230 | 0.8596 | - | - | - | - | - | - | | 1.5783 | 240 | 0.7595 | - | - | - | - | - | - | | 1.6441 | 250 | 0.7377 | - | - | - | - | - | - | | 1.7098 | 260 | 0.6657 | - | - | - | - | - | - | | 1.7756 | 270 | 0.6838 | - | - | - | - | - | - | | 1.8413 | 280 | 0.6813 | - | - | - | - | - | - | | 1.9071 | 290 | 0.6322 | - | - | - | - | - | - | | 1.9729 | 300 | 0.6296 | - | - | - | - | - | - | | 1.9992 | 304 | - | 0.2404 | 0.6884 | 0.7126 | 0.7240 | 0.6529 | 0.7285 | | 2.0386 | 310 | 4.0272 | - | - | - | - | - | - | | 2.1044 | 320 | 11.576 | - | - | - | - | - | - | | 2.1702 | 330 | 14.1756 | - | - | - | - | - | - | | 2.2359 | 340 | 17.5422 | - | - | - | - | - | - | | 2.3017 | 350 | 19.0518 | - | - | - | - | - | - | | 2.3674 | 360 | 7.1039 | - | - | - | - | - | - | | 2.4332 | 370 | 0.9404 | - | - | - | - | - | - | | 2.4990 | 380 | 0.7094 | - | - | - | - | - | - | | 2.5647 | 390 | 0.5907 | - | - | - | - | - | - | | 2.6305 | 400 | 0.6083 | - | - | - | - | - | - | | 2.6963 | 410 | 0.5486 | - | - | - | - | - | - | | 2.7620 | 420 | 0.5529 | - | - | - | - | - | - | | 2.8278 | 430 | 0.5734 | - | - | - | - | - | - | | 2.8935 | 440 | 0.5653 | - | - | - | - | - | - | | 2.9593 | 450 | 0.534 | - | - | - | - | - | - | | 2.9988 | 456 | - | 0.2078 | 0.7028 | 0.7266 | 0.7336 | 0.6671 | 0.7349 | | 3.0251 | 460 | 1.5518 | - | - | - | - | - | - | | 3.0908 | 470 | 10.991 | - | - | - | - | - | - | | 3.1566 | 480 | 12.393 | - | - | - | - | - | - | | 3.2224 | 490 | 16.9122 | - | - | - | - | - | - | | 3.2881 | 500 | 18.3968 | - | - | - | - | - | - | | 3.3539 | 510 | 10.9782 | - | - | - | - | - | - | | 3.4196 | 520 | 0.654 | - | - | - | - | - | - | | 3.4854 | 530 | 0.607 | - | - | - | - | - | - | | 3.5512 | 540 | 0.5474 | - | - | - | - | - | - | | 3.6169 | 550 | 0.5771 | - | - | - | - | - | - | | 3.6827 | 560 | 0.5364 | - | - | - | - | - | - | | 3.7485 | 570 | 0.5323 | - | - | - | - | - | - | | 3.8142 | 580 | 0.5458 | - | - | - | - | - | - | | 3.8800 | 590 | 0.5738 | - | - | - | - | - | - | | 3.9457 | 600 | 0.5353 | - | - | - | - | - | - | | **3.9984** | **608** | **-** | **0.1882** | **0.7124** | **0.732** | **0.733** | **0.6856** | **0.7324** | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.1.1 - Transformers: 4.45.2 - PyTorch: 2.6.0+cu124 - Accelerate: 1.3.0 - Datasets: 2.19.1 - Tokenizers: 0.20.3 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```