---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:6300
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-m-v1.5
widget:
- source_sentence: Cost of net revenues represents costs associated with customer
support, site operations, and payment processing. Significant components of these
costs primarily consist of employee compensation (including stock-based compensation),
contractor costs, facilities costs, depreciation of equipment and amortization
expense, bank transaction fees, credit card interchange and assessment fees, authentication
costs, shipping costs and digital services tax.
sentences:
- What was the allowance for loan losses on GM Financial’s retail finance receivables
portfolio at the end of 2023?
- What are the key components of cost of net revenues?
- What percentage of McLane's consolidated sales in 2023 was comprised by grocery
sales?
- source_sentence: The net cash used in operating activities was reported as $215.2
million, $628.5 million, and $614.1 million for three respective periods.
sentences:
- What was the net cash used in operating activities for the respective periods
listed?
- What was the total growth investment capital expenditures in 2022?
- Where is the Financial Statement Schedule in IBM’s 2023 Form 10-K located?
- source_sentence: In 2023, the total operating expenses amounted to $4,331.6 million,
including costs of services, selling, general and administrative expenses, and
depreciation and amortization.
sentences:
- What were the total operating expenses for the company in 2023?
- How does CMS adjust the company's Medicare Advantage and Part D premium revenues?
- What was the average stockholders' deficit over the past five fiscal years up
to 2023?
- source_sentence: Johnson & Johnson reported cash and cash equivalents of $21,859
million as of the end of 2023.
sentences:
- Who are GameStop's main competitors in the global gaming industry?
- What was the amount of cash and cash equivalents reported by Johnson & Johnson
at the end of 2023?
- By what percentage has Chevron's UK oil-equivalent production increased from 2022
to 2023?
- source_sentence: As of December 31, 2023, Bank of America reported gross derivative
assets and liabilities totaling $290.3 billion and $301.2 billion, respectively.
After accounting for legally enforceable master netting agreements and cash collateral,
these figures were adjusted to $39.3 billion in assets and $43.4 billion in liabilities.
sentences:
- What is the significant raw material used by MiTek and how does its supply impact
the company?
- By what percentage did HIV product sales increase in 2023 compared to the previous
year?
- What were the total derivative assets and liabilities at Bank of America as of
December 31, 2023, after adjusting for master netting agreements and cash collateral?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: BGE base Financial Matryoshka
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.7542857142857143
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8614285714285714
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8914285714285715
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9328571428571428
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7542857142857143
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.28714285714285714
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.17828571428571424
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09328571428571428
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7542857142857143
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8614285714285714
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8914285714285715
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9328571428571428
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8430593058746703
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.814359410430839
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8171120142759164
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.7542857142857143
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8614285714285714
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8914285714285715
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.93
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7542857142857143
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.28714285714285714
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.17828571428571427
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09299999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7542857142857143
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8614285714285714
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8914285714285715
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.93
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8409010665384006
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.8124268707482996
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8153207256101372
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.7557142857142857
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.86
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8942857142857142
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9285714285714286
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7557142857142857
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2866666666666667
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.17885714285714283
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09285714285714286
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7557142857142857
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.86
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8942857142857142
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9285714285714286
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8408862139768868
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.8128662131519274
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8157678611118373
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.7442857142857143
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.85
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8871428571428571
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9142857142857143
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7442857142857143
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2833333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.1774285714285714
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09142857142857141
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7442857142857143
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.85
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8871428571428571
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9142857142857143
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8298257719970505
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.802593537414966
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8061119393433516
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.7
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8157142857142857
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8571428571428571
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9071428571428571
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2719047619047619
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.1714285714285714
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09071428571428569
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8157142857142857
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8571428571428571
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9071428571428571
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8023275744891828
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7689109977324261
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7722063607472032
name: Cosine Map@100
---
# BGE base Financial Matryoshka
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Snowflake/snowflake-arctic-embed-m-v1.5](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- json
- **Language:** en
- **License:** apache-2.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Abinaya/snowflake-arctic-embed-financial-matryoshka")
# Run inference
sentences = [
'As of December 31, 2023, Bank of America reported gross derivative assets and liabilities totaling $290.3 billion and $301.2 billion, respectively. After accounting for legally enforceable master netting agreements and cash collateral, these figures were adjusted to $39.3 billion in assets and $43.4 billion in liabilities.',
'What were the total derivative assets and liabilities at Bank of America as of December 31, 2023, after adjusting for master netting agreements and cash collateral?',
'By what percentage did HIV product sales increase in 2023 compared to the previous year?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Information Retrieval
* Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
|:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------|
| cosine_accuracy@1 | 0.7543 | 0.7543 | 0.7557 | 0.7443 | 0.7 |
| cosine_accuracy@3 | 0.8614 | 0.8614 | 0.86 | 0.85 | 0.8157 |
| cosine_accuracy@5 | 0.8914 | 0.8914 | 0.8943 | 0.8871 | 0.8571 |
| cosine_accuracy@10 | 0.9329 | 0.93 | 0.9286 | 0.9143 | 0.9071 |
| cosine_precision@1 | 0.7543 | 0.7543 | 0.7557 | 0.7443 | 0.7 |
| cosine_precision@3 | 0.2871 | 0.2871 | 0.2867 | 0.2833 | 0.2719 |
| cosine_precision@5 | 0.1783 | 0.1783 | 0.1789 | 0.1774 | 0.1714 |
| cosine_precision@10 | 0.0933 | 0.093 | 0.0929 | 0.0914 | 0.0907 |
| cosine_recall@1 | 0.7543 | 0.7543 | 0.7557 | 0.7443 | 0.7 |
| cosine_recall@3 | 0.8614 | 0.8614 | 0.86 | 0.85 | 0.8157 |
| cosine_recall@5 | 0.8914 | 0.8914 | 0.8943 | 0.8871 | 0.8571 |
| cosine_recall@10 | 0.9329 | 0.93 | 0.9286 | 0.9143 | 0.9071 |
| **cosine_ndcg@10** | **0.8431** | **0.8409** | **0.8409** | **0.8298** | **0.8023** |
| cosine_mrr@10 | 0.8144 | 0.8124 | 0.8129 | 0.8026 | 0.7689 |
| cosine_map@100 | 0.8171 | 0.8153 | 0.8158 | 0.8061 | 0.7722 |
## Training Details
### Training Dataset
#### json
* Dataset: json
* Size: 6,300 training samples
* Columns: positive
and anchor
* Approximate statistics based on the first 1000 samples:
| | positive | anchor |
|:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
| type | string | string |
| details |
Opioids Related Securities Class Actions and Derivative Litigation: Three derivative complaints and two securities class actions drawing heavily on the allegations of the DOJ complaint have been filed in Delaware naming the Company and various current and former directors and certain current and former officers as defendants. The plaintiffs in the derivative suits (in which the Company is a nominal defendant) allege, among other things, that the defendants breached their fidariety duties in connection with oversight of opioids dispensing and distribution and that the defendants violated Section 14(a) of the Securities Exchange Act of 1934, as amended (the 'Exchange Act()), and are liable for contribution under Section 10(b) of the Exchange Act in connection with the Company's disclosures about opioids.
| What kind of claims are involved in the securities and derivative litigation against the Company listed in the document?
|
| Walmart's fintech venture, ONE, provides financial services such as money orders, prepaid access, money transfers, check cashing, bill payment, and certain types of installment lending.
| What types of financial services are offered through Walmart's fintech venture, ONE?
|
| Juice and juice concentrate from various fruits, particularly orange juice and orange juice concentrate, are principal raw materials for juice and juice drink products, and milk is the principal raw material for dairy products managed through fairlife, LLC.
| What are the primary raw materials for the company's juice and dairy products?
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: epoch
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 4
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `bf16`: True
- `tf32`: True
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates
#### All Hyperparameters