---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:311351
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: How much non-cash impairment losses were recognized on theaters
in international markets in 2022?
sentences:
- The timing and amounts of deductible and taxable items and the probability of
sustaining uncertain tax positions requires significant judgment. The benefits
of uncertain tax positions are recorded in the Company’s consolidated financial
statements only after determining a more-likely-than-not probability that the
uncertain tax positions...
- During the year ended December 31, 2022, non-cash impairment losses of $59.7 million
were recognized on 53 theaters in the International markets which were related
to property, net, and operating lease right-of-use assets, net.
- Under Item 8 Financial Statements and Supplementary Data is discussed which includes
a variety of financial reporting.
- source_sentence: What factors led to the increase in Intelligent Edge earnings from
operations as a percentage of net revenue?
sentences:
- Intelligent Edge earnings from operations as a percentage of net revenue increased
12.4 percentage points primarily due to decreases in cost of products and services
as a percentage of net revenue and operating expenses as a percentage of net revenue.
- Diverse and Inclusive Workplace We work to build a diverse and inclusive workplace
where we can leverage our collective cognitive diversity to build the best products
and make the best decisions for the global community we serve. We want our products
to work for people around the world and we need to grow and keep the best talent
in order to do that.
- In March 2023, the Board of Directors sanctioned a restructuring plan concentrated
on investment prioritization towards significant growth prospects and the optimization
of the company's real estate assets. This includes substantial organizational
changes such as reductions in office space and workforce.
- source_sentence: What type of financial information is provided in Part IV, Item
15(a)(1) of the Annual Report on Form 10-K?
sentences:
- Our self-insurance reserve estimates totaled $268.8 million at August 26, 2023.
- The consolidated financial statements and accompanying notes listed in Part IV,
Item 15(a)(1) of the Annual Report on Form 10-K.
- Constant dollar changes and adjusted financial results are non-GAAP financial
measures. A constant dollar basis assumes the average foreign currency exchange
rates for the period remained constant with the average foreign currency exchange
rates for the same period of the prior year. We provide constant dollar changes
in our results to help investors understand the underlying growth rate of net
revenue excluding the impact of changes in foreign currency exchange rates.
- source_sentence: What constitutes the largest expense in the company's various expense
categories?
sentences:
- Personnel-related costs are the most significant component of the company's operating
expenses such as research and development, sales and marketing, and general and
administrative expenses, excluding restructuring and asset impairment charges.
- As of October 31, 2022, the aggregate projected benefit obligations for U.S. Defined
Benefit Plans were $289 million and for Non-U.S Defined Benefit Plans it was $996
million. As of October 31, 2023, these obligations were $267 million for U.S.
Defined Benefit Plans and $1,052 million for Non-U.S. Defined Benefit Plans.
- The Management’s Discussion and Analysis section discusses the company's financial
condition and results of operations, suggesting it be read alongside the consolidated
financial statements included in the Annual Report on Form 10-K.
- source_sentence: What was the total premiums revenue for the Insurance segment in
2023?
sentences:
- Cash, cash equivalents and restricted cash at end of period is reported to be
$6,985.
- Insurance segment premiums revenue increased $13.6 billion, or 15.5%, from $87.7
billion in the 2022 period to $101.3 billion in the 2023 period.
- On a quarterly basis, we employ a consistent, systematic and rational methodology
to assess the adequacy of our warranty liability.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: Vignesh finetuned bge
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.6385714285714286
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8057142857142857
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8514285714285714
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8885714285714286
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6385714285714286
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.26857142857142857
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.17028571428571426
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.08885714285714284
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6385714285714286
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8057142857142857
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8514285714285714
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8885714285714286
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7672875738418359
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7279155328798183
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7324235298064157
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.6428571428571429
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.7957142857142857
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8428571428571429
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8785714285714286
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6428571428571429
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2652380952380952
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.16857142857142857
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.08785714285714284
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6428571428571429
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.7957142857142857
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8428571428571429
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8785714285714286
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7649011857503378
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7278752834467118
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7330044690874636
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.6414285714285715
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.84
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8814285714285715
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6414285714285715
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.26666666666666666
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.16799999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.08814285714285712
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6414285714285715
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.84
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8814285714285715
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7649914708405767
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7272885487528342
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7320436030547072
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.6171428571428571
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.7785714285714286
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8242857142857143
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8728571428571429
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6171428571428571
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2595238095238095
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.1648571428571428
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.08728571428571427
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6171428571428571
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.7785714285714286
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8242857142857143
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8728571428571429
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7477873461127673
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7075640589569155
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7124046732307174
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.59
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.7485714285714286
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.7942857142857143
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8671428571428571
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.59
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2495238095238095
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.15885714285714284
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.0867142857142857
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.59
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.7485714285714286
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.7942857142857143
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8671428571428571
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7258107978054003
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.6810629251700676
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.685615559071417
name: Cosine Map@100
---
# Vignesh finetuned bge
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- json
- **Language:** en
- **License:** apache-2.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("viggypoker1/Vignesh-finetuned-bge")
# Run inference
sentences = [
'What was the total premiums revenue for the Insurance segment in 2023?',
'Insurance segment premiums revenue increased $13.6 billion, or 15.5%, from $87.7 billion in the 2022 period to $101.3 billion in the 2023 period.',
'On a quarterly basis, we employ a consistent, systematic and rational methodology to assess the adequacy of our warranty liability.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Information Retrieval
* Dataset: `dim_768`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.6386 |
| cosine_accuracy@3 | 0.8057 |
| cosine_accuracy@5 | 0.8514 |
| cosine_accuracy@10 | 0.8886 |
| cosine_precision@1 | 0.6386 |
| cosine_precision@3 | 0.2686 |
| cosine_precision@5 | 0.1703 |
| cosine_precision@10 | 0.0889 |
| cosine_recall@1 | 0.6386 |
| cosine_recall@3 | 0.8057 |
| cosine_recall@5 | 0.8514 |
| cosine_recall@10 | 0.8886 |
| cosine_ndcg@10 | 0.7673 |
| cosine_mrr@10 | 0.7279 |
| **cosine_map@100** | **0.7324** |
#### Information Retrieval
* Dataset: `dim_512`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:----------|
| cosine_accuracy@1 | 0.6429 |
| cosine_accuracy@3 | 0.7957 |
| cosine_accuracy@5 | 0.8429 |
| cosine_accuracy@10 | 0.8786 |
| cosine_precision@1 | 0.6429 |
| cosine_precision@3 | 0.2652 |
| cosine_precision@5 | 0.1686 |
| cosine_precision@10 | 0.0879 |
| cosine_recall@1 | 0.6429 |
| cosine_recall@3 | 0.7957 |
| cosine_recall@5 | 0.8429 |
| cosine_recall@10 | 0.8786 |
| cosine_ndcg@10 | 0.7649 |
| cosine_mrr@10 | 0.7279 |
| **cosine_map@100** | **0.733** |
#### Information Retrieval
* Dataset: `dim_256`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:----------|
| cosine_accuracy@1 | 0.6414 |
| cosine_accuracy@3 | 0.8 |
| cosine_accuracy@5 | 0.84 |
| cosine_accuracy@10 | 0.8814 |
| cosine_precision@1 | 0.6414 |
| cosine_precision@3 | 0.2667 |
| cosine_precision@5 | 0.168 |
| cosine_precision@10 | 0.0881 |
| cosine_recall@1 | 0.6414 |
| cosine_recall@3 | 0.8 |
| cosine_recall@5 | 0.84 |
| cosine_recall@10 | 0.8814 |
| cosine_ndcg@10 | 0.765 |
| cosine_mrr@10 | 0.7273 |
| **cosine_map@100** | **0.732** |
#### Information Retrieval
* Dataset: `dim_128`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.6171 |
| cosine_accuracy@3 | 0.7786 |
| cosine_accuracy@5 | 0.8243 |
| cosine_accuracy@10 | 0.8729 |
| cosine_precision@1 | 0.6171 |
| cosine_precision@3 | 0.2595 |
| cosine_precision@5 | 0.1649 |
| cosine_precision@10 | 0.0873 |
| cosine_recall@1 | 0.6171 |
| cosine_recall@3 | 0.7786 |
| cosine_recall@5 | 0.8243 |
| cosine_recall@10 | 0.8729 |
| cosine_ndcg@10 | 0.7478 |
| cosine_mrr@10 | 0.7076 |
| **cosine_map@100** | **0.7124** |
#### Information Retrieval
* Dataset: `dim_64`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.59 |
| cosine_accuracy@3 | 0.7486 |
| cosine_accuracy@5 | 0.7943 |
| cosine_accuracy@10 | 0.8671 |
| cosine_precision@1 | 0.59 |
| cosine_precision@3 | 0.2495 |
| cosine_precision@5 | 0.1589 |
| cosine_precision@10 | 0.0867 |
| cosine_recall@1 | 0.59 |
| cosine_recall@3 | 0.7486 |
| cosine_recall@5 | 0.7943 |
| cosine_recall@10 | 0.8671 |
| cosine_ndcg@10 | 0.7258 |
| cosine_mrr@10 | 0.6811 |
| **cosine_map@100** | **0.6856** |
## Training Details
### Training Dataset
#### json
* Dataset: json
* Size: 311,351 training samples
* Columns: anchor
and positive
* Approximate statistics based on the first 1000 samples:
| | anchor | positive |
|:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details |
What percentage of net revenues came from Mutual Funds, ETFs, and Collective Trust Funds (CTFs) in 2023?
| Mutual Funds, ETFs, and Collective Trust Funds (CTFs) contributed 13% to the net revenues in 2023.
|
| What was the amount of additional stock-based compensation expense recognized due to the Type 3 modification in the year ended December 31, 2023?
| A special award grant on February 23, 2023, resulted in a Type 3 modification of the 2022 PSU awards, leading to an additional stock-based compensation expense of $20.2 million recognized in that year.
|
| What was the percentage point decrease in earnings from operations as a percentage of net revenue for the Printing segment in the fiscal year 2023?
| Printing earnings from operations as a percentage of net revenue decreased by 0.2 percentage points in the fiscal year 2023.
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Evaluation Dataset
#### json
* Dataset: json
* Size: 700 evaluation samples
* Columns: anchor
and positive
* Approximate statistics based on the first 700 samples:
| | anchor | positive |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details | How does GameStop optimize the efficiency of its product distribution?
| We use our distribution facilities, store locations and inventory management systems to optimize the efficiency of the flow of products to our stores and customers, enhance fulfillment efficiency and optimize in-stock and overall investment in inventory.
|
| What was the net production increase percentage of Chevron's worldwide oil-equivalent from 2022 to 2023?
| For the year 2023, Chevron's worldwide oil-equivalent production was 3.1 million barrels per day, marking an increase of about 4 percent from the 2022 level.
|
| How has Tesla sought to increase the affordability of their vehicles in international markets?
| Internationally, we also have manufacturing facilities in China (Gigafactory Shanghai) and Germany (Gigafactory Berlin-Brandenburg), which allows us to increase the affordability of our vehicles for customers in local markets by reducing transportation and manufacturing costs and eliminating the impact of unfavorable tariffs.
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: epoch
- `per_device_train_batch_size`: 128
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 4
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `fp16`: True
- `tf32`: False
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates
#### All Hyperparameters