--- language: - en license: apache-2.0 tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:6300 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: Snowflake/snowflake-arctic-embed-m-v1.5 widget: - source_sentence: Cost of net revenues represents costs associated with customer support, site operations, and payment processing. Significant components of these costs primarily consist of employee compensation (including stock-based compensation), contractor costs, facilities costs, depreciation of equipment and amortization expense, bank transaction fees, credit card interchange and assessment fees, authentication costs, shipping costs and digital services tax. sentences: - What was the allowance for loan losses on GM Financial’s retail finance receivables portfolio at the end of 2023? - What are the key components of cost of net revenues? - What percentage of McLane's consolidated sales in 2023 was comprised by grocery sales? - source_sentence: The net cash used in operating activities was reported as $215.2 million, $628.5 million, and $614.1 million for three respective periods. sentences: - What was the net cash used in operating activities for the respective periods listed? - What was the total growth investment capital expenditures in 2022? - Where is the Financial Statement Schedule in IBM’s 2023 Form 10-K located? - source_sentence: In 2023, the total operating expenses amounted to $4,331.6 million, including costs of services, selling, general and administrative expenses, and depreciation and amortization. sentences: - What were the total operating expenses for the company in 2023? - How does CMS adjust the company's Medicare Advantage and Part D premium revenues? - What was the average stockholders' deficit over the past five fiscal years up to 2023? - source_sentence: Johnson & Johnson reported cash and cash equivalents of $21,859 million as of the end of 2023. sentences: - Who are GameStop's main competitors in the global gaming industry? - What was the amount of cash and cash equivalents reported by Johnson & Johnson at the end of 2023? - By what percentage has Chevron's UK oil-equivalent production increased from 2022 to 2023? - source_sentence: As of December 31, 2023, Bank of America reported gross derivative assets and liabilities totaling $290.3 billion and $301.2 billion, respectively. After accounting for legally enforceable master netting agreements and cash collateral, these figures were adjusted to $39.3 billion in assets and $43.4 billion in liabilities. sentences: - What is the significant raw material used by MiTek and how does its supply impact the company? - By what percentage did HIV product sales increase in 2023 compared to the previous year? - What were the total derivative assets and liabilities at Bank of America as of December 31, 2023, after adjusting for master netting agreements and cash collateral? pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: BGE base Financial Matryoshka results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.7542857142857143 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8614285714285714 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8914285714285715 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9328571428571428 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7542857142857143 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.28714285714285714 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.17828571428571424 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09328571428571428 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7542857142857143 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8614285714285714 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8914285714285715 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9328571428571428 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8430593058746703 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.814359410430839 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8171120142759164 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.7542857142857143 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8614285714285714 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8914285714285715 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.93 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7542857142857143 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.28714285714285714 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.17828571428571427 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09299999999999999 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7542857142857143 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8614285714285714 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8914285714285715 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.93 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8409010665384006 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.8124268707482996 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8153207256101372 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy@1 value: 0.7557142857142857 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.86 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8942857142857142 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9285714285714286 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7557142857142857 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2866666666666667 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.17885714285714283 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09285714285714286 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7557142857142857 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.86 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8942857142857142 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9285714285714286 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8408862139768868 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.8128662131519274 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8157678611118373 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.7442857142857143 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.85 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8871428571428571 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9142857142857143 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7442857142857143 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2833333333333333 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1774285714285714 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09142857142857141 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7442857142857143 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.85 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8871428571428571 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9142857142857143 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8298257719970505 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.802593537414966 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8061119393433516 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.7 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8157142857142857 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8571428571428571 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9071428571428571 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2719047619047619 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1714285714285714 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09071428571428569 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8157142857142857 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8571428571428571 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9071428571428571 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8023275744891828 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.7689109977324261 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7722063607472032 name: Cosine Map@100 --- # BGE base Financial Matryoshka This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Training Dataset:** - json - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("Abinaya/snowflake-arctic-embed-financial-matryoshka") # Run inference sentences = [ 'As of December 31, 2023, Bank of America reported gross derivative assets and liabilities totaling $290.3 billion and $301.2 billion, respectively. After accounting for legally enforceable master netting agreements and cash collateral, these figures were adjusted to $39.3 billion in assets and $43.4 billion in liabilities.', 'What were the total derivative assets and liabilities at Bank of America as of December 31, 2023, after adjusting for master netting agreements and cash collateral?', 'By what percentage did HIV product sales increase in 2023 compared to the previous year?', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 | |:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------| | cosine_accuracy@1 | 0.7543 | 0.7543 | 0.7557 | 0.7443 | 0.7 | | cosine_accuracy@3 | 0.8614 | 0.8614 | 0.86 | 0.85 | 0.8157 | | cosine_accuracy@5 | 0.8914 | 0.8914 | 0.8943 | 0.8871 | 0.8571 | | cosine_accuracy@10 | 0.9329 | 0.93 | 0.9286 | 0.9143 | 0.9071 | | cosine_precision@1 | 0.7543 | 0.7543 | 0.7557 | 0.7443 | 0.7 | | cosine_precision@3 | 0.2871 | 0.2871 | 0.2867 | 0.2833 | 0.2719 | | cosine_precision@5 | 0.1783 | 0.1783 | 0.1789 | 0.1774 | 0.1714 | | cosine_precision@10 | 0.0933 | 0.093 | 0.0929 | 0.0914 | 0.0907 | | cosine_recall@1 | 0.7543 | 0.7543 | 0.7557 | 0.7443 | 0.7 | | cosine_recall@3 | 0.8614 | 0.8614 | 0.86 | 0.85 | 0.8157 | | cosine_recall@5 | 0.8914 | 0.8914 | 0.8943 | 0.8871 | 0.8571 | | cosine_recall@10 | 0.9329 | 0.93 | 0.9286 | 0.9143 | 0.9071 | | **cosine_ndcg@10** | **0.8431** | **0.8409** | **0.8409** | **0.8298** | **0.8023** | | cosine_mrr@10 | 0.8144 | 0.8124 | 0.8129 | 0.8026 | 0.7689 | | cosine_map@100 | 0.8171 | 0.8153 | 0.8158 | 0.8061 | 0.7722 | ## Training Details ### Training Dataset #### json * Dataset: json * Size: 6,300 training samples * Columns: positive and anchor * Approximate statistics based on the first 1000 samples: | | positive | anchor | |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | positive | anchor | |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------| | Opioids Related Securities Class Actions and Derivative Litigation: Three derivative complaints and two securities class actions drawing heavily on the allegations of the DOJ complaint have been filed in Delaware naming the Company and various current and former directors and certain current and former officers as defendants. The plaintiffs in the derivative suits (in which the Company is a nominal defendant) allege, among other things, that the defendants breached their fidariety duties in connection with oversight of opioids dispensing and distribution and that the defendants violated Section 14(a) of the Securities Exchange Act of 1934, as amended (the 'Exchange Act()), and are liable for contribution under Section 10(b) of the Exchange Act in connection with the Company's disclosures about opioids. | What kind of claims are involved in the securities and derivative litigation against the Company listed in the document? | | Walmart's fintech venture, ONE, provides financial services such as money orders, prepaid access, money transfers, check cashing, bill payment, and certain types of installment lending. | What types of financial services are offered through Walmart's fintech venture, ONE? | | Juice and juice concentrate from various fruits, particularly orange juice and orange juice concentrate, are principal raw materials for juice and juice drink products, and milk is the principal raw material for dairy products managed through fairlife, LLC. | What are the primary raw materials for the company's juice and dairy products? | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `gradient_accumulation_steps`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 4 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `bf16`: True - `tf32`: True - `load_best_model_at_end`: True - `optim`: adamw_torch_fused - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 16 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 4 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: True - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: True - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 | |:---------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:| | 0.8122 | 10 | 1.5521 | - | - | - | - | - | | 1.0 | 13 | - | 0.8136 | 0.8108 | 0.8143 | 0.7949 | 0.7552 | | 1.5685 | 20 | 0.4812 | - | - | - | - | - | | 2.0 | 26 | - | 0.8405 | 0.8388 | 0.8384 | 0.8284 | 0.7990 | | 2.3249 | 30 | 0.3585 | - | - | - | - | - | | 3.0 | 39 | - | 0.8420 | 0.8409 | 0.8408 | 0.8290 | 0.8009 | | 3.0812 | 40 | 0.3101 | - | - | - | - | - | | **3.731** | **48** | **-** | **0.8431** | **0.8409** | **0.8409** | **0.8298** | **0.8023** | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.15 - Sentence Transformers: 3.4.0 - Transformers: 4.47.1 - PyTorch: 2.5.1+cu124 - Accelerate: 1.0.1 - Datasets: 3.0.1 - Tokenizers: 0.21.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```