prabigyap commited on
Commit
aa5d68d
·
verified ·
1 Parent(s): 61c1644

Upload fine-tuned model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,511 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:843
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: BAAI/bge-base-en-v1.5
14
+ widget:
15
+ - source_sentence: (1) No person shall make attempt to commit an offence. Even if
16
+ it is impossible for an offence to be committed for which attempt is made, attempt
17
+ shall be considered to have been committed. Except as otherwise provided elsewhere
18
+ in this Act, a person who attempts, or causes attempt, to commit an offence shall
19
+ be punished with one half of the punishment specified for such offence. .
20
+ sentences:
21
+ - How is the punishment for an attempt determined?
22
+ - What are the different types of guarantees?
23
+ - What are the specific types of crimes that are considered 'strict liability'?
24
+ - source_sentence: ': (1) No person shall commit, or cause to be committed, cheating.
25
+ (2) For the purposes of sub-section (1), a person who dishonestly causes any kind
26
+ of loss, damage or injury to another person whom he or she makes believe in some
27
+ matter or to any other person or obtains any benefit for him or her or any one
28
+ else by omitting to do as per such belief or by inducement, fraudulent, dishonest
29
+ or otherwise deceptive act or preventing such other person from doing any act
30
+ shall be considered to commit cheating.'
31
+ sentences:
32
+ - How is 'fraudulent concealment' defined?
33
+ - What are the terms and restrictions that must be followed when producing explosives
34
+ under a license?
35
+ - What is the process for determining the appropriate penalty for a cheating offense?
36
+ - source_sentence: (1) No person shall restraint or otherwise obstruct or hinder a
37
+ person who, upon knowing that an offence has been committed or is about to be
38
+ committed, intends to give information or notice about such offence to the police
39
+ or competent authority. imprisonment for a term not exceeding two years or a fine
40
+ not exceeding twenty thousand rupees or both the sentences.
41
+ sentences:
42
+ - What actions constitute 'restraint, obstruction, or hindrance'?
43
+ - What are the consequences of engaging in such conduct?
44
+ - What are the different categories of victims, and how do the penalties vary based
45
+ on their age?
46
+ - source_sentence: This law prohibits the creation, use, possession, or sale of inaccurate
47
+ weighing, measuring, or quality-standard instruments. It also prohibits tampering
48
+ with seals or marks on these instruments, or manipulating their accuracy. Violations
49
+ carry a penalty of up to three years imprisonment and a fine. Instruments and
50
+ tools used in the offense are subject to forfeiture.
51
+ sentences:
52
+ - What are the penalties for using banned currency?
53
+ - What is the time frame for reporting an offense under this law?
54
+ - When does this law come into effect?
55
+ - source_sentence: This section lists factors that decrease the seriousness of a crime. These
56
+ include age (under 18 or over 75), lack of intent, provocation by the victim,
57
+ retaliation for a serious offense, confession and remorse, surrender to authorities,
58
+ compensation to the victim, diminished capacity, insignificant harm, assistance
59
+ in the judicial process, confession with a promise of no future crime, and crimes
60
+ committed under duress.
61
+ sentences:
62
+ - What constitutes "lack of intent" in this context?
63
+ - What is the difference between an attempt and the actual commission of a crime?
64
+ - What are the exceptions to the prohibition on property transactions in marriage?
65
+ pipeline_tag: sentence-similarity
66
+ library_name: sentence-transformers
67
+ metrics:
68
+ - cosine_accuracy@1
69
+ - cosine_accuracy@3
70
+ - cosine_accuracy@5
71
+ - cosine_accuracy@10
72
+ - cosine_precision@1
73
+ - cosine_precision@3
74
+ - cosine_precision@5
75
+ - cosine_precision@10
76
+ - cosine_recall@1
77
+ - cosine_recall@3
78
+ - cosine_recall@5
79
+ - cosine_recall@10
80
+ - cosine_ndcg@10
81
+ - cosine_mrr@10
82
+ - cosine_map@100
83
+ model-index:
84
+ - name: BGE base Financial Matryoshka
85
+ results:
86
+ - task:
87
+ type: information-retrieval
88
+ name: Information Retrieval
89
+ dataset:
90
+ name: dim 128
91
+ type: dim_128
92
+ metrics:
93
+ - type: cosine_accuracy@1
94
+ value: 0.13744075829383887
95
+ name: Cosine Accuracy@1
96
+ - type: cosine_accuracy@3
97
+ value: 0.4312796208530806
98
+ name: Cosine Accuracy@3
99
+ - type: cosine_accuracy@5
100
+ value: 0.5450236966824644
101
+ name: Cosine Accuracy@5
102
+ - type: cosine_accuracy@10
103
+ value: 0.6445497630331753
104
+ name: Cosine Accuracy@10
105
+ - type: cosine_precision@1
106
+ value: 0.13744075829383887
107
+ name: Cosine Precision@1
108
+ - type: cosine_precision@3
109
+ value: 0.14375987361769352
110
+ name: Cosine Precision@3
111
+ - type: cosine_precision@5
112
+ value: 0.10900473933649289
113
+ name: Cosine Precision@5
114
+ - type: cosine_precision@10
115
+ value: 0.06445497630331752
116
+ name: Cosine Precision@10
117
+ - type: cosine_recall@1
118
+ value: 0.13744075829383887
119
+ name: Cosine Recall@1
120
+ - type: cosine_recall@3
121
+ value: 0.4312796208530806
122
+ name: Cosine Recall@3
123
+ - type: cosine_recall@5
124
+ value: 0.5450236966824644
125
+ name: Cosine Recall@5
126
+ - type: cosine_recall@10
127
+ value: 0.6445497630331753
128
+ name: Cosine Recall@10
129
+ - type: cosine_ndcg@10
130
+ value: 0.38906851558265765
131
+ name: Cosine Ndcg@10
132
+ - type: cosine_mrr@10
133
+ value: 0.3073817046565864
134
+ name: Cosine Mrr@10
135
+ - type: cosine_map@100
136
+ value: 0.31738583597003633
137
+ name: Cosine Map@100
138
+ ---
139
+
140
+ # BGE base Financial Matryoshka
141
+
142
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
143
+
144
+ ## Model Details
145
+
146
+ ### Model Description
147
+ - **Model Type:** Sentence Transformer
148
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
149
+ - **Maximum Sequence Length:** 512 tokens
150
+ - **Output Dimensionality:** 768 dimensions
151
+ - **Similarity Function:** Cosine Similarity
152
+ <!-- - **Training Dataset:** Unknown -->
153
+ - **Language:** en
154
+ - **License:** apache-2.0
155
+
156
+ ### Model Sources
157
+
158
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
159
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
160
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
161
+
162
+ ### Full Model Architecture
163
+
164
+ ```
165
+ SentenceTransformer(
166
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
167
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
168
+ (2): Normalize()
169
+ )
170
+ ```
171
+
172
+ ## Usage
173
+
174
+ ### Direct Usage (Sentence Transformers)
175
+
176
+ First install the Sentence Transformers library:
177
+
178
+ ```bash
179
+ pip install -U sentence-transformers
180
+ ```
181
+
182
+ Then you can load this model and run inference.
183
+ ```python
184
+ from sentence_transformers import SentenceTransformer
185
+
186
+ # Download from the 🤗 Hub
187
+ model = SentenceTransformer("sentence_transformers_model_id")
188
+ # Run inference
189
+ sentences = [
190
+ 'This section lists factors that decrease the seriousness of a crime. These include age (under 18 or over 75), lack of intent, provocation by the victim, retaliation for a serious offense, confession and remorse, surrender to authorities, compensation to the victim, diminished capacity, insignificant harm, assistance in the judicial process, confession with a promise of no future crime, and crimes committed under duress.',
191
+ 'What constitutes "lack of intent" in this context?',
192
+ 'What are the exceptions to the prohibition on property transactions in marriage?',
193
+ ]
194
+ embeddings = model.encode(sentences)
195
+ print(embeddings.shape)
196
+ # [3, 768]
197
+
198
+ # Get the similarity scores for the embeddings
199
+ similarities = model.similarity(embeddings, embeddings)
200
+ print(similarities.shape)
201
+ # [3, 3]
202
+ ```
203
+
204
+ <!--
205
+ ### Direct Usage (Transformers)
206
+
207
+ <details><summary>Click to see the direct usage in Transformers</summary>
208
+
209
+ </details>
210
+ -->
211
+
212
+ <!--
213
+ ### Downstream Usage (Sentence Transformers)
214
+
215
+ You can finetune this model on your own dataset.
216
+
217
+ <details><summary>Click to expand</summary>
218
+
219
+ </details>
220
+ -->
221
+
222
+ <!--
223
+ ### Out-of-Scope Use
224
+
225
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
226
+ -->
227
+
228
+ ## Evaluation
229
+
230
+ ### Metrics
231
+
232
+ #### Information Retrieval
233
+
234
+ * Dataset: `dim_128`
235
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
236
+
237
+ | Metric | Value |
238
+ |:--------------------|:-----------|
239
+ | cosine_accuracy@1 | 0.1374 |
240
+ | cosine_accuracy@3 | 0.4313 |
241
+ | cosine_accuracy@5 | 0.545 |
242
+ | cosine_accuracy@10 | 0.6445 |
243
+ | cosine_precision@1 | 0.1374 |
244
+ | cosine_precision@3 | 0.1438 |
245
+ | cosine_precision@5 | 0.109 |
246
+ | cosine_precision@10 | 0.0645 |
247
+ | cosine_recall@1 | 0.1374 |
248
+ | cosine_recall@3 | 0.4313 |
249
+ | cosine_recall@5 | 0.545 |
250
+ | cosine_recall@10 | 0.6445 |
251
+ | **cosine_ndcg@10** | **0.3891** |
252
+ | cosine_mrr@10 | 0.3074 |
253
+ | cosine_map@100 | 0.3174 |
254
+
255
+ <!--
256
+ ## Bias, Risks and Limitations
257
+
258
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
259
+ -->
260
+
261
+ <!--
262
+ ### Recommendations
263
+
264
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
265
+ -->
266
+
267
+ ## Training Details
268
+
269
+ ### Training Dataset
270
+
271
+ #### Unnamed Dataset
272
+
273
+
274
+ * Size: 843 training samples
275
+ * Columns: <code>positive</code> and <code>anchor</code>
276
+ * Approximate statistics based on the first 843 samples:
277
+ | | positive | anchor |
278
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
279
+ | type | string | string |
280
+ | details | <ul><li>min: 9 tokens</li><li>mean: 66.68 tokens</li><li>max: 151 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 14.77 tokens</li><li>max: 39 tokens</li></ul> |
281
+ * Samples:
282
+ | positive | anchor |
283
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|
284
+ | <code>This law prohibits unlawful detention of individuals. It outlines penalties for unlawful confinement and obstruction of a person's movement. It also specifies a time limit for complaints related to certain offenses.</code> | <code>What is the process for reporting unlawful detention?</code> |
285
+ | <code>No complaint shall lie in relation to any of the offences under Section 290, after the expiry of three months from the date of commission of such offence, and in relation to any of the other offences under this Chapter, after the expiry of three months from the date of knowledge of commission of such act.</code> | <code>What are the time limits for reporting and prosecuting offenses related to animal cruelty?</code> |
286
+ | <code>(1) No person, being legally bound to receive a summons, process, notice, arrest warrant or order issued by the competent authority, shall abscond, with mala fide intention to avoid being served with such summons, process, notice, arrest warrant or order.</code> | <code>What is the legal definition of being "legally bound" to receive a document?</code> |
287
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
288
+ ```json
289
+ {
290
+ "loss": "MultipleNegativesRankingLoss",
291
+ "matryoshka_dims": [
292
+ 128
293
+ ],
294
+ "matryoshka_weights": [
295
+ 1
296
+ ],
297
+ "n_dims_per_step": -1
298
+ }
299
+ ```
300
+
301
+ ### Training Hyperparameters
302
+ #### Non-Default Hyperparameters
303
+
304
+ - `eval_strategy`: epoch
305
+ - `per_device_train_batch_size`: 32
306
+ - `per_device_eval_batch_size`: 16
307
+ - `gradient_accumulation_steps`: 16
308
+ - `learning_rate`: 2e-05
309
+ - `num_train_epochs`: 1
310
+ - `lr_scheduler_type`: cosine
311
+ - `warmup_ratio`: 0.1
312
+ - `tf32`: False
313
+ - `load_best_model_at_end`: True
314
+ - `optim`: adamw_torch_fused
315
+ - `batch_sampler`: no_duplicates
316
+
317
+ #### All Hyperparameters
318
+ <details><summary>Click to expand</summary>
319
+
320
+ - `overwrite_output_dir`: False
321
+ - `do_predict`: False
322
+ - `eval_strategy`: epoch
323
+ - `prediction_loss_only`: True
324
+ - `per_device_train_batch_size`: 32
325
+ - `per_device_eval_batch_size`: 16
326
+ - `per_gpu_train_batch_size`: None
327
+ - `per_gpu_eval_batch_size`: None
328
+ - `gradient_accumulation_steps`: 16
329
+ - `eval_accumulation_steps`: None
330
+ - `torch_empty_cache_steps`: None
331
+ - `learning_rate`: 2e-05
332
+ - `weight_decay`: 0.0
333
+ - `adam_beta1`: 0.9
334
+ - `adam_beta2`: 0.999
335
+ - `adam_epsilon`: 1e-08
336
+ - `max_grad_norm`: 1.0
337
+ - `num_train_epochs`: 1
338
+ - `max_steps`: -1
339
+ - `lr_scheduler_type`: cosine
340
+ - `lr_scheduler_kwargs`: {}
341
+ - `warmup_ratio`: 0.1
342
+ - `warmup_steps`: 0
343
+ - `log_level`: passive
344
+ - `log_level_replica`: warning
345
+ - `log_on_each_node`: True
346
+ - `logging_nan_inf_filter`: True
347
+ - `save_safetensors`: True
348
+ - `save_on_each_node`: False
349
+ - `save_only_model`: False
350
+ - `restore_callback_states_from_checkpoint`: False
351
+ - `no_cuda`: False
352
+ - `use_cpu`: False
353
+ - `use_mps_device`: False
354
+ - `seed`: 42
355
+ - `data_seed`: None
356
+ - `jit_mode_eval`: False
357
+ - `use_ipex`: False
358
+ - `bf16`: False
359
+ - `fp16`: False
360
+ - `fp16_opt_level`: O1
361
+ - `half_precision_backend`: auto
362
+ - `bf16_full_eval`: False
363
+ - `fp16_full_eval`: False
364
+ - `tf32`: False
365
+ - `local_rank`: 0
366
+ - `ddp_backend`: None
367
+ - `tpu_num_cores`: None
368
+ - `tpu_metrics_debug`: False
369
+ - `debug`: []
370
+ - `dataloader_drop_last`: False
371
+ - `dataloader_num_workers`: 0
372
+ - `dataloader_prefetch_factor`: None
373
+ - `past_index`: -1
374
+ - `disable_tqdm`: False
375
+ - `remove_unused_columns`: True
376
+ - `label_names`: None
377
+ - `load_best_model_at_end`: True
378
+ - `ignore_data_skip`: False
379
+ - `fsdp`: []
380
+ - `fsdp_min_num_params`: 0
381
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
382
+ - `fsdp_transformer_layer_cls_to_wrap`: None
383
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
384
+ - `deepspeed`: None
385
+ - `label_smoothing_factor`: 0.0
386
+ - `optim`: adamw_torch_fused
387
+ - `optim_args`: None
388
+ - `adafactor`: False
389
+ - `group_by_length`: False
390
+ - `length_column_name`: length
391
+ - `ddp_find_unused_parameters`: None
392
+ - `ddp_bucket_cap_mb`: None
393
+ - `ddp_broadcast_buffers`: False
394
+ - `dataloader_pin_memory`: True
395
+ - `dataloader_persistent_workers`: False
396
+ - `skip_memory_metrics`: True
397
+ - `use_legacy_prediction_loop`: False
398
+ - `push_to_hub`: False
399
+ - `resume_from_checkpoint`: None
400
+ - `hub_model_id`: None
401
+ - `hub_strategy`: every_save
402
+ - `hub_private_repo`: None
403
+ - `hub_always_push`: False
404
+ - `gradient_checkpointing`: False
405
+ - `gradient_checkpointing_kwargs`: None
406
+ - `include_inputs_for_metrics`: False
407
+ - `include_for_metrics`: []
408
+ - `eval_do_concat_batches`: True
409
+ - `fp16_backend`: auto
410
+ - `push_to_hub_model_id`: None
411
+ - `push_to_hub_organization`: None
412
+ - `mp_parameters`:
413
+ - `auto_find_batch_size`: False
414
+ - `full_determinism`: False
415
+ - `torchdynamo`: None
416
+ - `ray_scope`: last
417
+ - `ddp_timeout`: 1800
418
+ - `torch_compile`: False
419
+ - `torch_compile_backend`: None
420
+ - `torch_compile_mode`: None
421
+ - `dispatch_batches`: None
422
+ - `split_batches`: None
423
+ - `include_tokens_per_second`: False
424
+ - `include_num_input_tokens_seen`: False
425
+ - `neftune_noise_alpha`: None
426
+ - `optim_target_modules`: None
427
+ - `batch_eval_metrics`: False
428
+ - `eval_on_start`: False
429
+ - `use_liger_kernel`: False
430
+ - `eval_use_gather_object`: False
431
+ - `average_tokens_across_devices`: False
432
+ - `prompts`: None
433
+ - `batch_sampler`: no_duplicates
434
+ - `multi_dataset_batch_sampler`: proportional
435
+
436
+ </details>
437
+
438
+ ### Training Logs
439
+ | Epoch | Step | dim_128_cosine_ndcg@10 |
440
+ |:----------:|:-----:|:----------------------:|
441
+ | **0.5926** | **1** | **0.3891** |
442
+
443
+ * The bold row denotes the saved checkpoint.
444
+
445
+ ### Framework Versions
446
+ - Python: 3.10.12
447
+ - Sentence Transformers: 3.3.1
448
+ - Transformers: 4.47.1
449
+ - PyTorch: 2.5.1+cu121
450
+ - Accelerate: 0.27.0
451
+ - Datasets: 3.2.0
452
+ - Tokenizers: 0.21.0
453
+
454
+ ## Citation
455
+
456
+ ### BibTeX
457
+
458
+ #### Sentence Transformers
459
+ ```bibtex
460
+ @inproceedings{reimers-2019-sentence-bert,
461
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
462
+ author = "Reimers, Nils and Gurevych, Iryna",
463
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
464
+ month = "11",
465
+ year = "2019",
466
+ publisher = "Association for Computational Linguistics",
467
+ url = "https://arxiv.org/abs/1908.10084",
468
+ }
469
+ ```
470
+
471
+ #### MatryoshkaLoss
472
+ ```bibtex
473
+ @misc{kusupati2024matryoshka,
474
+ title={Matryoshka Representation Learning},
475
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
476
+ year={2024},
477
+ eprint={2205.13147},
478
+ archivePrefix={arXiv},
479
+ primaryClass={cs.LG}
480
+ }
481
+ ```
482
+
483
+ #### MultipleNegativesRankingLoss
484
+ ```bibtex
485
+ @misc{henderson2017efficient,
486
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
487
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
488
+ year={2017},
489
+ eprint={1705.00652},
490
+ archivePrefix={arXiv},
491
+ primaryClass={cs.CL}
492
+ }
493
+ ```
494
+
495
+ <!--
496
+ ## Glossary
497
+
498
+ *Clearly define terms in order to be accessible across audiences.*
499
+ -->
500
+
501
+ <!--
502
+ ## Model Card Authors
503
+
504
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
505
+ -->
506
+
507
+ <!--
508
+ ## Model Card Contact
509
+
510
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
511
+ -->
checkpoint-22/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
checkpoint-22/README.md ADDED
@@ -0,0 +1,522 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:843
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: BAAI/bge-base-en-v1.5
14
+ widget:
15
+ - source_sentence: (1) No person shall make attempt to commit an offence. Even if
16
+ it is impossible for an offence to be committed for which attempt is made, attempt
17
+ shall be considered to have been committed. Except as otherwise provided elsewhere
18
+ in this Act, a person who attempts, or causes attempt, to commit an offence shall
19
+ be punished with one half of the punishment specified for such offence. .
20
+ sentences:
21
+ - How is the punishment for an attempt determined?
22
+ - What are the different types of guarantees?
23
+ - What are the specific types of crimes that are considered 'strict liability'?
24
+ - source_sentence: ': (1) No person shall commit, or cause to be committed, cheating.
25
+ (2) For the purposes of sub-section (1), a person who dishonestly causes any kind
26
+ of loss, damage or injury to another person whom he or she makes believe in some
27
+ matter or to any other person or obtains any benefit for him or her or any one
28
+ else by omitting to do as per such belief or by inducement, fraudulent, dishonest
29
+ or otherwise deceptive act or preventing such other person from doing any act
30
+ shall be considered to commit cheating.'
31
+ sentences:
32
+ - How is 'fraudulent concealment' defined?
33
+ - What are the terms and restrictions that must be followed when producing explosives
34
+ under a license?
35
+ - What is the process for determining the appropriate penalty for a cheating offense?
36
+ - source_sentence: (1) No person shall restraint or otherwise obstruct or hinder a
37
+ person who, upon knowing that an offence has been committed or is about to be
38
+ committed, intends to give information or notice about such offence to the police
39
+ or competent authority. imprisonment for a term not exceeding two years or a fine
40
+ not exceeding twenty thousand rupees or both the sentences.
41
+ sentences:
42
+ - What actions constitute 'restraint, obstruction, or hindrance'?
43
+ - What are the consequences of engaging in such conduct?
44
+ - What are the different categories of victims, and how do the penalties vary based
45
+ on their age?
46
+ - source_sentence: This law prohibits the creation, use, possession, or sale of inaccurate
47
+ weighing, measuring, or quality-standard instruments. It also prohibits tampering
48
+ with seals or marks on these instruments, or manipulating their accuracy. Violations
49
+ carry a penalty of up to three years imprisonment and a fine. Instruments and
50
+ tools used in the offense are subject to forfeiture.
51
+ sentences:
52
+ - What are the penalties for using banned currency?
53
+ - What is the time frame for reporting an offense under this law?
54
+ - When does this law come into effect?
55
+ - source_sentence: This section lists factors that decrease the seriousness of a crime. These
56
+ include age (under 18 or over 75), lack of intent, provocation by the victim,
57
+ retaliation for a serious offense, confession and remorse, surrender to authorities,
58
+ compensation to the victim, diminished capacity, insignificant harm, assistance
59
+ in the judicial process, confession with a promise of no future crime, and crimes
60
+ committed under duress.
61
+ sentences:
62
+ - What constitutes "lack of intent" in this context?
63
+ - What is the difference between an attempt and the actual commission of a crime?
64
+ - What are the exceptions to the prohibition on property transactions in marriage?
65
+ pipeline_tag: sentence-similarity
66
+ library_name: sentence-transformers
67
+ metrics:
68
+ - cosine_accuracy@1
69
+ - cosine_accuracy@3
70
+ - cosine_accuracy@5
71
+ - cosine_accuracy@10
72
+ - cosine_precision@1
73
+ - cosine_precision@3
74
+ - cosine_precision@5
75
+ - cosine_precision@10
76
+ - cosine_recall@1
77
+ - cosine_recall@3
78
+ - cosine_recall@5
79
+ - cosine_recall@10
80
+ - cosine_ndcg@10
81
+ - cosine_mrr@10
82
+ - cosine_map@100
83
+ model-index:
84
+ - name: BGE base Financial Matryoshka
85
+ results:
86
+ - task:
87
+ type: information-retrieval
88
+ name: Information Retrieval
89
+ dataset:
90
+ name: dim 128
91
+ type: dim_128
92
+ metrics:
93
+ - type: cosine_accuracy@1
94
+ value: 0.22748815165876776
95
+ name: Cosine Accuracy@1
96
+ - type: cosine_accuracy@3
97
+ value: 0.5592417061611374
98
+ name: Cosine Accuracy@3
99
+ - type: cosine_accuracy@5
100
+ value: 0.6872037914691943
101
+ name: Cosine Accuracy@5
102
+ - type: cosine_accuracy@10
103
+ value: 0.7962085308056872
104
+ name: Cosine Accuracy@10
105
+ - type: cosine_precision@1
106
+ value: 0.22748815165876776
107
+ name: Cosine Precision@1
108
+ - type: cosine_precision@3
109
+ value: 0.18641390205371247
110
+ name: Cosine Precision@3
111
+ - type: cosine_precision@5
112
+ value: 0.13744075829383884
113
+ name: Cosine Precision@5
114
+ - type: cosine_precision@10
115
+ value: 0.07962085308056871
116
+ name: Cosine Precision@10
117
+ - type: cosine_recall@1
118
+ value: 0.22748815165876776
119
+ name: Cosine Recall@1
120
+ - type: cosine_recall@3
121
+ value: 0.5592417061611374
122
+ name: Cosine Recall@3
123
+ - type: cosine_recall@5
124
+ value: 0.6872037914691943
125
+ name: Cosine Recall@5
126
+ - type: cosine_recall@10
127
+ value: 0.7962085308056872
128
+ name: Cosine Recall@10
129
+ - type: cosine_ndcg@10
130
+ value: 0.5057041685567575
131
+ name: Cosine Ndcg@10
132
+ - type: cosine_mrr@10
133
+ value: 0.4129203340103815
134
+ name: Cosine Mrr@10
135
+ - type: cosine_map@100
136
+ value: 0.4195949550112758
137
+ name: Cosine Map@100
138
+ ---
139
+
140
+ # BGE base Financial Matryoshka
141
+
142
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
143
+
144
+ ## Model Details
145
+
146
+ ### Model Description
147
+ - **Model Type:** Sentence Transformer
148
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
149
+ - **Maximum Sequence Length:** 512 tokens
150
+ - **Output Dimensionality:** 768 dimensions
151
+ - **Similarity Function:** Cosine Similarity
152
+ <!-- - **Training Dataset:** Unknown -->
153
+ - **Language:** en
154
+ - **License:** apache-2.0
155
+
156
+ ### Model Sources
157
+
158
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
159
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
160
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
161
+
162
+ ### Full Model Architecture
163
+
164
+ ```
165
+ SentenceTransformer(
166
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
167
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
168
+ (2): Normalize()
169
+ )
170
+ ```
171
+
172
+ ## Usage
173
+
174
+ ### Direct Usage (Sentence Transformers)
175
+
176
+ First install the Sentence Transformers library:
177
+
178
+ ```bash
179
+ pip install -U sentence-transformers
180
+ ```
181
+
182
+ Then you can load this model and run inference.
183
+ ```python
184
+ from sentence_transformers import SentenceTransformer
185
+
186
+ # Download from the 🤗 Hub
187
+ model = SentenceTransformer("sentence_transformers_model_id")
188
+ # Run inference
189
+ sentences = [
190
+ 'This section lists factors that decrease the seriousness of a crime. These include age (under 18 or over 75), lack of intent, provocation by the victim, retaliation for a serious offense, confession and remorse, surrender to authorities, compensation to the victim, diminished capacity, insignificant harm, assistance in the judicial process, confession with a promise of no future crime, and crimes committed under duress.',
191
+ 'What constitutes "lack of intent" in this context?',
192
+ 'What are the exceptions to the prohibition on property transactions in marriage?',
193
+ ]
194
+ embeddings = model.encode(sentences)
195
+ print(embeddings.shape)
196
+ # [3, 768]
197
+
198
+ # Get the similarity scores for the embeddings
199
+ similarities = model.similarity(embeddings, embeddings)
200
+ print(similarities.shape)
201
+ # [3, 3]
202
+ ```
203
+
204
+ <!--
205
+ ### Direct Usage (Transformers)
206
+
207
+ <details><summary>Click to see the direct usage in Transformers</summary>
208
+
209
+ </details>
210
+ -->
211
+
212
+ <!--
213
+ ### Downstream Usage (Sentence Transformers)
214
+
215
+ You can finetune this model on your own dataset.
216
+
217
+ <details><summary>Click to expand</summary>
218
+
219
+ </details>
220
+ -->
221
+
222
+ <!--
223
+ ### Out-of-Scope Use
224
+
225
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
226
+ -->
227
+
228
+ ## Evaluation
229
+
230
+ ### Metrics
231
+
232
+ #### Information Retrieval
233
+
234
+ * Dataset: `dim_128`
235
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
236
+
237
+ | Metric | Value |
238
+ |:--------------------|:-----------|
239
+ | cosine_accuracy@1 | 0.2275 |
240
+ | cosine_accuracy@3 | 0.5592 |
241
+ | cosine_accuracy@5 | 0.6872 |
242
+ | cosine_accuracy@10 | 0.7962 |
243
+ | cosine_precision@1 | 0.2275 |
244
+ | cosine_precision@3 | 0.1864 |
245
+ | cosine_precision@5 | 0.1374 |
246
+ | cosine_precision@10 | 0.0796 |
247
+ | cosine_recall@1 | 0.2275 |
248
+ | cosine_recall@3 | 0.5592 |
249
+ | cosine_recall@5 | 0.6872 |
250
+ | cosine_recall@10 | 0.7962 |
251
+ | **cosine_ndcg@10** | **0.5057** |
252
+ | cosine_mrr@10 | 0.4129 |
253
+ | cosine_map@100 | 0.4196 |
254
+
255
+ <!--
256
+ ## Bias, Risks and Limitations
257
+
258
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
259
+ -->
260
+
261
+ <!--
262
+ ### Recommendations
263
+
264
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
265
+ -->
266
+
267
+ ## Training Details
268
+
269
+ ### Training Dataset
270
+
271
+ #### Unnamed Dataset
272
+
273
+
274
+ * Size: 843 training samples
275
+ * Columns: <code>positive</code> and <code>anchor</code>
276
+ * Approximate statistics based on the first 843 samples:
277
+ | | positive | anchor |
278
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
279
+ | type | string | string |
280
+ | details | <ul><li>min: 9 tokens</li><li>mean: 66.68 tokens</li><li>max: 151 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 14.77 tokens</li><li>max: 39 tokens</li></ul> |
281
+ * Samples:
282
+ | positive | anchor |
283
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|
284
+ | <code>This law prohibits unlawful detention of individuals. It outlines penalties for unlawful confinement and obstruction of a person's movement. It also specifies a time limit for complaints related to certain offenses.</code> | <code>What is the process for reporting unlawful detention?</code> |
285
+ | <code>No complaint shall lie in relation to any of the offences under Section 290, after the expiry of three months from the date of commission of such offence, and in relation to any of the other offences under this Chapter, after the expiry of three months from the date of knowledge of commission of such act.</code> | <code>What are the time limits for reporting and prosecuting offenses related to animal cruelty?</code> |
286
+ | <code>(1) No person, being legally bound to receive a summons, process, notice, arrest warrant or order issued by the competent authority, shall abscond, with mala fide intention to avoid being served with such summons, process, notice, arrest warrant or order.</code> | <code>What is the legal definition of being "legally bound" to receive a document?</code> |
287
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
288
+ ```json
289
+ {
290
+ "loss": "MultipleNegativesRankingLoss",
291
+ "matryoshka_dims": [
292
+ 128
293
+ ],
294
+ "matryoshka_weights": [
295
+ 1
296
+ ],
297
+ "n_dims_per_step": -1
298
+ }
299
+ ```
300
+
301
+ ### Training Hyperparameters
302
+ #### Non-Default Hyperparameters
303
+
304
+ - `eval_strategy`: epoch
305
+ - `per_device_train_batch_size`: 32
306
+ - `per_device_eval_batch_size`: 16
307
+ - `gradient_accumulation_steps`: 16
308
+ - `learning_rate`: 2e-05
309
+ - `num_train_epochs`: 30
310
+ - `lr_scheduler_type`: cosine
311
+ - `warmup_ratio`: 0.1
312
+ - `tf32`: False
313
+ - `load_best_model_at_end`: True
314
+ - `optim`: adamw_torch_fused
315
+ - `batch_sampler`: no_duplicates
316
+
317
+ #### All Hyperparameters
318
+ <details><summary>Click to expand</summary>
319
+
320
+ - `overwrite_output_dir`: False
321
+ - `do_predict`: False
322
+ - `eval_strategy`: epoch
323
+ - `prediction_loss_only`: True
324
+ - `per_device_train_batch_size`: 32
325
+ - `per_device_eval_batch_size`: 16
326
+ - `per_gpu_train_batch_size`: None
327
+ - `per_gpu_eval_batch_size`: None
328
+ - `gradient_accumulation_steps`: 16
329
+ - `eval_accumulation_steps`: None
330
+ - `torch_empty_cache_steps`: None
331
+ - `learning_rate`: 2e-05
332
+ - `weight_decay`: 0.0
333
+ - `adam_beta1`: 0.9
334
+ - `adam_beta2`: 0.999
335
+ - `adam_epsilon`: 1e-08
336
+ - `max_grad_norm`: 1.0
337
+ - `num_train_epochs`: 30
338
+ - `max_steps`: -1
339
+ - `lr_scheduler_type`: cosine
340
+ - `lr_scheduler_kwargs`: {}
341
+ - `warmup_ratio`: 0.1
342
+ - `warmup_steps`: 0
343
+ - `log_level`: passive
344
+ - `log_level_replica`: warning
345
+ - `log_on_each_node`: True
346
+ - `logging_nan_inf_filter`: True
347
+ - `save_safetensors`: True
348
+ - `save_on_each_node`: False
349
+ - `save_only_model`: False
350
+ - `restore_callback_states_from_checkpoint`: False
351
+ - `no_cuda`: False
352
+ - `use_cpu`: False
353
+ - `use_mps_device`: False
354
+ - `seed`: 42
355
+ - `data_seed`: None
356
+ - `jit_mode_eval`: False
357
+ - `use_ipex`: False
358
+ - `bf16`: False
359
+ - `fp16`: False
360
+ - `fp16_opt_level`: O1
361
+ - `half_precision_backend`: auto
362
+ - `bf16_full_eval`: False
363
+ - `fp16_full_eval`: False
364
+ - `tf32`: False
365
+ - `local_rank`: 0
366
+ - `ddp_backend`: None
367
+ - `tpu_num_cores`: None
368
+ - `tpu_metrics_debug`: False
369
+ - `debug`: []
370
+ - `dataloader_drop_last`: False
371
+ - `dataloader_num_workers`: 0
372
+ - `dataloader_prefetch_factor`: None
373
+ - `past_index`: -1
374
+ - `disable_tqdm`: False
375
+ - `remove_unused_columns`: True
376
+ - `label_names`: None
377
+ - `load_best_model_at_end`: True
378
+ - `ignore_data_skip`: False
379
+ - `fsdp`: []
380
+ - `fsdp_min_num_params`: 0
381
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
382
+ - `fsdp_transformer_layer_cls_to_wrap`: None
383
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
384
+ - `deepspeed`: None
385
+ - `label_smoothing_factor`: 0.0
386
+ - `optim`: adamw_torch_fused
387
+ - `optim_args`: None
388
+ - `adafactor`: False
389
+ - `group_by_length`: False
390
+ - `length_column_name`: length
391
+ - `ddp_find_unused_parameters`: None
392
+ - `ddp_bucket_cap_mb`: None
393
+ - `ddp_broadcast_buffers`: False
394
+ - `dataloader_pin_memory`: True
395
+ - `dataloader_persistent_workers`: False
396
+ - `skip_memory_metrics`: True
397
+ - `use_legacy_prediction_loop`: False
398
+ - `push_to_hub`: False
399
+ - `resume_from_checkpoint`: None
400
+ - `hub_model_id`: None
401
+ - `hub_strategy`: every_save
402
+ - `hub_private_repo`: None
403
+ - `hub_always_push`: False
404
+ - `gradient_checkpointing`: False
405
+ - `gradient_checkpointing_kwargs`: None
406
+ - `include_inputs_for_metrics`: False
407
+ - `include_for_metrics`: []
408
+ - `eval_do_concat_batches`: True
409
+ - `fp16_backend`: auto
410
+ - `push_to_hub_model_id`: None
411
+ - `push_to_hub_organization`: None
412
+ - `mp_parameters`:
413
+ - `auto_find_batch_size`: False
414
+ - `full_determinism`: False
415
+ - `torchdynamo`: None
416
+ - `ray_scope`: last
417
+ - `ddp_timeout`: 1800
418
+ - `torch_compile`: False
419
+ - `torch_compile_backend`: None
420
+ - `torch_compile_mode`: None
421
+ - `dispatch_batches`: None
422
+ - `split_batches`: None
423
+ - `include_tokens_per_second`: False
424
+ - `include_num_input_tokens_seen`: False
425
+ - `neftune_noise_alpha`: None
426
+ - `optim_target_modules`: None
427
+ - `batch_eval_metrics`: False
428
+ - `eval_on_start`: False
429
+ - `use_liger_kernel`: False
430
+ - `eval_use_gather_object`: False
431
+ - `average_tokens_across_devices`: False
432
+ - `prompts`: None
433
+ - `batch_sampler`: no_duplicates
434
+ - `multi_dataset_batch_sampler`: proportional
435
+
436
+ </details>
437
+
438
+ ### Training Logs
439
+ | Epoch | Step | Training Loss | dim_128_cosine_ndcg@10 |
440
+ |:----------:|:-----:|:-------------:|:----------------------:|
441
+ | **0.5926** | **1** | **-** | **0.3891** |
442
+ | 1.0 | 2 | - | 0.4127 |
443
+ | 2.0 | 4 | - | 0.4611 |
444
+ | 3.0 | 6 | - | 0.4676 |
445
+ | 4.0 | 8 | - | 0.4909 |
446
+ | 5.0 | 10 | 1.1743 | 0.4808 |
447
+ | 6.0 | 12 | - | 0.4891 |
448
+ | 7.0 | 14 | - | 0.5027 |
449
+ | 8.0 | 16 | - | 0.4979 |
450
+ | 9.0 | 18 | - | 0.5047 |
451
+ | 10.0 | 20 | 0.5481 | 0.5031 |
452
+ | 11.0 | 22 | - | 0.5057 |
453
+
454
+ * The bold row denotes the saved checkpoint.
455
+
456
+ ### Framework Versions
457
+ - Python: 3.10.12
458
+ - Sentence Transformers: 3.3.1
459
+ - Transformers: 4.47.1
460
+ - PyTorch: 2.5.1+cu121
461
+ - Accelerate: 0.27.0
462
+ - Datasets: 3.2.0
463
+ - Tokenizers: 0.21.0
464
+
465
+ ## Citation
466
+
467
+ ### BibTeX
468
+
469
+ #### Sentence Transformers
470
+ ```bibtex
471
+ @inproceedings{reimers-2019-sentence-bert,
472
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
473
+ author = "Reimers, Nils and Gurevych, Iryna",
474
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
475
+ month = "11",
476
+ year = "2019",
477
+ publisher = "Association for Computational Linguistics",
478
+ url = "https://arxiv.org/abs/1908.10084",
479
+ }
480
+ ```
481
+
482
+ #### MatryoshkaLoss
483
+ ```bibtex
484
+ @misc{kusupati2024matryoshka,
485
+ title={Matryoshka Representation Learning},
486
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
487
+ year={2024},
488
+ eprint={2205.13147},
489
+ archivePrefix={arXiv},
490
+ primaryClass={cs.LG}
491
+ }
492
+ ```
493
+
494
+ #### MultipleNegativesRankingLoss
495
+ ```bibtex
496
+ @misc{henderson2017efficient,
497
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
498
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
499
+ year={2017},
500
+ eprint={1705.00652},
501
+ archivePrefix={arXiv},
502
+ primaryClass={cs.CL}
503
+ }
504
+ ```
505
+
506
+ <!--
507
+ ## Glossary
508
+
509
+ *Clearly define terms in order to be accessible across audiences.*
510
+ -->
511
+
512
+ <!--
513
+ ## Model Card Authors
514
+
515
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
516
+ -->
517
+
518
+ <!--
519
+ ## Model Card Contact
520
+
521
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
522
+ -->
checkpoint-22/config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.47.1",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
checkpoint-22/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.1",
5
+ "pytorch": "2.5.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
checkpoint-22/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5335aa639a8a337c9d55541256abea22d90f9087c2a6d8344804675de776523e
3
+ size 437951328
checkpoint-22/modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
checkpoint-22/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56401e0afb6ad4061210adb67cea0957f1a998412d6d13769bd97f796e735848
3
+ size 14244
checkpoint-22/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64a7cbea70391b8ccce3b98e2c324c25f2ae8f235754e0ed6e936df11409a9d1
3
+ size 1064
checkpoint-22/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
checkpoint-22/special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
checkpoint-22/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-22/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-22/trainer_state.json ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.5057041685567575,
3
+ "best_model_checkpoint": "/content/drive/MyDrive/Kaanun/bge_models_weights/checkpoint-22",
4
+ "epoch": 11.0,
5
+ "eval_steps": 500,
6
+ "global_step": 22,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_dim_128_cosine_accuracy@1": 0.16113744075829384,
14
+ "eval_dim_128_cosine_accuracy@10": 0.6682464454976303,
15
+ "eval_dim_128_cosine_accuracy@3": 0.4597156398104265,
16
+ "eval_dim_128_cosine_accuracy@5": 0.5687203791469194,
17
+ "eval_dim_128_cosine_map@100": 0.3403695803235615,
18
+ "eval_dim_128_cosine_mrr@10": 0.3310708643647034,
19
+ "eval_dim_128_cosine_ndcg@10": 0.4127209358099893,
20
+ "eval_dim_128_cosine_precision@1": 0.16113744075829384,
21
+ "eval_dim_128_cosine_precision@10": 0.06682464454976303,
22
+ "eval_dim_128_cosine_precision@3": 0.1532385466034755,
23
+ "eval_dim_128_cosine_precision@5": 0.11374407582938388,
24
+ "eval_dim_128_cosine_recall@1": 0.16113744075829384,
25
+ "eval_dim_128_cosine_recall@10": 0.6682464454976303,
26
+ "eval_dim_128_cosine_recall@3": 0.4597156398104265,
27
+ "eval_dim_128_cosine_recall@5": 0.5687203791469194,
28
+ "eval_runtime": 4.2031,
29
+ "eval_samples_per_second": 0.0,
30
+ "eval_sequential_score": 0.4127209358099893,
31
+ "eval_steps_per_second": 0.0,
32
+ "step": 2
33
+ },
34
+ {
35
+ "epoch": 2.0,
36
+ "eval_dim_128_cosine_accuracy@1": 0.22748815165876776,
37
+ "eval_dim_128_cosine_accuracy@10": 0.7061611374407583,
38
+ "eval_dim_128_cosine_accuracy@3": 0.4976303317535545,
39
+ "eval_dim_128_cosine_accuracy@5": 0.6161137440758294,
40
+ "eval_dim_128_cosine_map@100": 0.39139098029603353,
41
+ "eval_dim_128_cosine_mrr@10": 0.38265252388475157,
42
+ "eval_dim_128_cosine_ndcg@10": 0.46108879234694766,
43
+ "eval_dim_128_cosine_precision@1": 0.22748815165876776,
44
+ "eval_dim_128_cosine_precision@10": 0.07061611374407582,
45
+ "eval_dim_128_cosine_precision@3": 0.16587677725118483,
46
+ "eval_dim_128_cosine_precision@5": 0.12322274881516586,
47
+ "eval_dim_128_cosine_recall@1": 0.22748815165876776,
48
+ "eval_dim_128_cosine_recall@10": 0.7061611374407583,
49
+ "eval_dim_128_cosine_recall@3": 0.4976303317535545,
50
+ "eval_dim_128_cosine_recall@5": 0.6161137440758294,
51
+ "eval_runtime": 3.9064,
52
+ "eval_samples_per_second": 0.0,
53
+ "eval_sequential_score": 0.46108879234694766,
54
+ "eval_steps_per_second": 0.0,
55
+ "step": 4
56
+ },
57
+ {
58
+ "epoch": 3.0,
59
+ "eval_dim_128_cosine_accuracy@1": 0.21800947867298578,
60
+ "eval_dim_128_cosine_accuracy@10": 0.7393364928909952,
61
+ "eval_dim_128_cosine_accuracy@3": 0.5165876777251185,
62
+ "eval_dim_128_cosine_accuracy@5": 0.6255924170616114,
63
+ "eval_dim_128_cosine_map@100": 0.3893562232228884,
64
+ "eval_dim_128_cosine_mrr@10": 0.38131347325660137,
65
+ "eval_dim_128_cosine_ndcg@10": 0.46756907555767835,
66
+ "eval_dim_128_cosine_precision@1": 0.21800947867298578,
67
+ "eval_dim_128_cosine_precision@10": 0.07393364928909951,
68
+ "eval_dim_128_cosine_precision@3": 0.17219589257503948,
69
+ "eval_dim_128_cosine_precision@5": 0.12511848341232226,
70
+ "eval_dim_128_cosine_recall@1": 0.21800947867298578,
71
+ "eval_dim_128_cosine_recall@10": 0.7393364928909952,
72
+ "eval_dim_128_cosine_recall@3": 0.5165876777251185,
73
+ "eval_dim_128_cosine_recall@5": 0.6255924170616114,
74
+ "eval_runtime": 3.9779,
75
+ "eval_samples_per_second": 0.0,
76
+ "eval_sequential_score": 0.46756907555767835,
77
+ "eval_steps_per_second": 0.0,
78
+ "step": 6
79
+ },
80
+ {
81
+ "epoch": 4.0,
82
+ "eval_dim_128_cosine_accuracy@1": 0.23696682464454977,
83
+ "eval_dim_128_cosine_accuracy@10": 0.7630331753554502,
84
+ "eval_dim_128_cosine_accuracy@3": 0.5497630331753555,
85
+ "eval_dim_128_cosine_accuracy@5": 0.6540284360189573,
86
+ "eval_dim_128_cosine_map@100": 0.41172122261842037,
87
+ "eval_dim_128_cosine_mrr@10": 0.40448356277740155,
88
+ "eval_dim_128_cosine_ndcg@10": 0.49093979707861896,
89
+ "eval_dim_128_cosine_precision@1": 0.23696682464454977,
90
+ "eval_dim_128_cosine_precision@10": 0.07630331753554502,
91
+ "eval_dim_128_cosine_precision@3": 0.18325434439178515,
92
+ "eval_dim_128_cosine_precision@5": 0.13080568720379146,
93
+ "eval_dim_128_cosine_recall@1": 0.23696682464454977,
94
+ "eval_dim_128_cosine_recall@10": 0.7630331753554502,
95
+ "eval_dim_128_cosine_recall@3": 0.5497630331753555,
96
+ "eval_dim_128_cosine_recall@5": 0.6540284360189573,
97
+ "eval_runtime": 4.0653,
98
+ "eval_samples_per_second": 0.0,
99
+ "eval_sequential_score": 0.49093979707861896,
100
+ "eval_steps_per_second": 0.0,
101
+ "step": 8
102
+ },
103
+ {
104
+ "epoch": 5.0,
105
+ "grad_norm": 45.40250778198242,
106
+ "learning_rate": 1.686241637868734e-05,
107
+ "loss": 1.1743,
108
+ "step": 10
109
+ },
110
+ {
111
+ "epoch": 5.0,
112
+ "eval_dim_128_cosine_accuracy@1": 0.20853080568720378,
113
+ "eval_dim_128_cosine_accuracy@10": 0.7582938388625592,
114
+ "eval_dim_128_cosine_accuracy@3": 0.5497630331753555,
115
+ "eval_dim_128_cosine_accuracy@5": 0.6587677725118484,
116
+ "eval_dim_128_cosine_map@100": 0.39983825778494503,
117
+ "eval_dim_128_cosine_mrr@10": 0.39171368389377875,
118
+ "eval_dim_128_cosine_ndcg@10": 0.4808079658208555,
119
+ "eval_dim_128_cosine_precision@1": 0.20853080568720378,
120
+ "eval_dim_128_cosine_precision@10": 0.07582938388625592,
121
+ "eval_dim_128_cosine_precision@3": 0.18325434439178517,
122
+ "eval_dim_128_cosine_precision@5": 0.13175355450236967,
123
+ "eval_dim_128_cosine_recall@1": 0.20853080568720378,
124
+ "eval_dim_128_cosine_recall@10": 0.7582938388625592,
125
+ "eval_dim_128_cosine_recall@3": 0.5497630331753555,
126
+ "eval_dim_128_cosine_recall@5": 0.6587677725118484,
127
+ "eval_runtime": 3.9109,
128
+ "eval_samples_per_second": 0.0,
129
+ "eval_sequential_score": 0.4808079658208555,
130
+ "eval_steps_per_second": 0.0,
131
+ "step": 10
132
+ },
133
+ {
134
+ "epoch": 6.0,
135
+ "eval_dim_128_cosine_accuracy@1": 0.2132701421800948,
136
+ "eval_dim_128_cosine_accuracy@10": 0.7725118483412322,
137
+ "eval_dim_128_cosine_accuracy@3": 0.5450236966824644,
138
+ "eval_dim_128_cosine_accuracy@5": 0.6777251184834123,
139
+ "eval_dim_128_cosine_map@100": 0.40598541933929044,
140
+ "eval_dim_128_cosine_mrr@10": 0.3981606860753781,
141
+ "eval_dim_128_cosine_ndcg@10": 0.48914093383392965,
142
+ "eval_dim_128_cosine_precision@1": 0.2132701421800948,
143
+ "eval_dim_128_cosine_precision@10": 0.07725118483412322,
144
+ "eval_dim_128_cosine_precision@3": 0.18167456556082145,
145
+ "eval_dim_128_cosine_precision@5": 0.13554502369668245,
146
+ "eval_dim_128_cosine_recall@1": 0.2132701421800948,
147
+ "eval_dim_128_cosine_recall@10": 0.7725118483412322,
148
+ "eval_dim_128_cosine_recall@3": 0.5450236966824644,
149
+ "eval_dim_128_cosine_recall@5": 0.6777251184834123,
150
+ "eval_runtime": 4.0272,
151
+ "eval_samples_per_second": 0.0,
152
+ "eval_sequential_score": 0.48914093383392965,
153
+ "eval_steps_per_second": 0.0,
154
+ "step": 12
155
+ },
156
+ {
157
+ "epoch": 7.0,
158
+ "eval_dim_128_cosine_accuracy@1": 0.23696682464454977,
159
+ "eval_dim_128_cosine_accuracy@10": 0.7725118483412322,
160
+ "eval_dim_128_cosine_accuracy@3": 0.5592417061611374,
161
+ "eval_dim_128_cosine_accuracy@5": 0.6824644549763034,
162
+ "eval_dim_128_cosine_map@100": 0.4239892051486296,
163
+ "eval_dim_128_cosine_mrr@10": 0.41595388550364865,
164
+ "eval_dim_128_cosine_ndcg@10": 0.5026698240348001,
165
+ "eval_dim_128_cosine_precision@1": 0.23696682464454977,
166
+ "eval_dim_128_cosine_precision@10": 0.07725118483412322,
167
+ "eval_dim_128_cosine_precision@3": 0.18641390205371247,
168
+ "eval_dim_128_cosine_precision@5": 0.13649289099526066,
169
+ "eval_dim_128_cosine_recall@1": 0.23696682464454977,
170
+ "eval_dim_128_cosine_recall@10": 0.7725118483412322,
171
+ "eval_dim_128_cosine_recall@3": 0.5592417061611374,
172
+ "eval_dim_128_cosine_recall@5": 0.6824644549763034,
173
+ "eval_runtime": 3.9845,
174
+ "eval_samples_per_second": 0.0,
175
+ "eval_sequential_score": 0.5026698240348001,
176
+ "eval_steps_per_second": 0.0,
177
+ "step": 14
178
+ },
179
+ {
180
+ "epoch": 8.0,
181
+ "eval_dim_128_cosine_accuracy@1": 0.22274881516587677,
182
+ "eval_dim_128_cosine_accuracy@10": 0.7819905213270142,
183
+ "eval_dim_128_cosine_accuracy@3": 0.5592417061611374,
184
+ "eval_dim_128_cosine_accuracy@5": 0.6824644549763034,
185
+ "eval_dim_128_cosine_map@100": 0.41434628700872,
186
+ "eval_dim_128_cosine_mrr@10": 0.4067573158805388,
187
+ "eval_dim_128_cosine_ndcg@10": 0.49785247883707107,
188
+ "eval_dim_128_cosine_precision@1": 0.22274881516587677,
189
+ "eval_dim_128_cosine_precision@10": 0.07819905213270142,
190
+ "eval_dim_128_cosine_precision@3": 0.18641390205371247,
191
+ "eval_dim_128_cosine_precision@5": 0.13649289099526066,
192
+ "eval_dim_128_cosine_recall@1": 0.22274881516587677,
193
+ "eval_dim_128_cosine_recall@10": 0.7819905213270142,
194
+ "eval_dim_128_cosine_recall@3": 0.5592417061611374,
195
+ "eval_dim_128_cosine_recall@5": 0.6824644549763034,
196
+ "eval_runtime": 4.0093,
197
+ "eval_samples_per_second": 0.0,
198
+ "eval_sequential_score": 0.49785247883707107,
199
+ "eval_steps_per_second": 0.0,
200
+ "step": 16
201
+ },
202
+ {
203
+ "epoch": 9.0,
204
+ "eval_dim_128_cosine_accuracy@1": 0.23696682464454977,
205
+ "eval_dim_128_cosine_accuracy@10": 0.7819905213270142,
206
+ "eval_dim_128_cosine_accuracy@3": 0.5545023696682464,
207
+ "eval_dim_128_cosine_accuracy@5": 0.6872037914691943,
208
+ "eval_dim_128_cosine_map@100": 0.4238089651869204,
209
+ "eval_dim_128_cosine_mrr@10": 0.41594072068005733,
210
+ "eval_dim_128_cosine_ndcg@10": 0.5047472995112773,
211
+ "eval_dim_128_cosine_precision@1": 0.23696682464454977,
212
+ "eval_dim_128_cosine_precision@10": 0.07819905213270142,
213
+ "eval_dim_128_cosine_precision@3": 0.1848341232227488,
214
+ "eval_dim_128_cosine_precision@5": 0.13744075829383884,
215
+ "eval_dim_128_cosine_recall@1": 0.23696682464454977,
216
+ "eval_dim_128_cosine_recall@10": 0.7819905213270142,
217
+ "eval_dim_128_cosine_recall@3": 0.5545023696682464,
218
+ "eval_dim_128_cosine_recall@5": 0.6872037914691943,
219
+ "eval_runtime": 4.1008,
220
+ "eval_samples_per_second": 0.0,
221
+ "eval_sequential_score": 0.5047472995112773,
222
+ "eval_steps_per_second": 0.0,
223
+ "step": 18
224
+ },
225
+ {
226
+ "epoch": 10.0,
227
+ "grad_norm": 35.63777160644531,
228
+ "learning_rate": 6.039202339608432e-06,
229
+ "loss": 0.5481,
230
+ "step": 20
231
+ },
232
+ {
233
+ "epoch": 10.0,
234
+ "eval_dim_128_cosine_accuracy@1": 0.22748815165876776,
235
+ "eval_dim_128_cosine_accuracy@10": 0.7867298578199052,
236
+ "eval_dim_128_cosine_accuracy@3": 0.5734597156398105,
237
+ "eval_dim_128_cosine_accuracy@5": 0.6872037914691943,
238
+ "eval_dim_128_cosine_map@100": 0.41945253821826306,
239
+ "eval_dim_128_cosine_mrr@10": 0.4121304445948996,
240
+ "eval_dim_128_cosine_ndcg@10": 0.5031216432900927,
241
+ "eval_dim_128_cosine_precision@1": 0.22748815165876776,
242
+ "eval_dim_128_cosine_precision@10": 0.07867298578199051,
243
+ "eval_dim_128_cosine_precision@3": 0.19115323854660346,
244
+ "eval_dim_128_cosine_precision@5": 0.13744075829383884,
245
+ "eval_dim_128_cosine_recall@1": 0.22748815165876776,
246
+ "eval_dim_128_cosine_recall@10": 0.7867298578199052,
247
+ "eval_dim_128_cosine_recall@3": 0.5734597156398105,
248
+ "eval_dim_128_cosine_recall@5": 0.6872037914691943,
249
+ "eval_runtime": 3.9667,
250
+ "eval_samples_per_second": 0.0,
251
+ "eval_sequential_score": 0.5031216432900927,
252
+ "eval_steps_per_second": 0.0,
253
+ "step": 20
254
+ },
255
+ {
256
+ "epoch": 11.0,
257
+ "eval_dim_128_cosine_accuracy@1": 0.22748815165876776,
258
+ "eval_dim_128_cosine_accuracy@10": 0.7962085308056872,
259
+ "eval_dim_128_cosine_accuracy@3": 0.5592417061611374,
260
+ "eval_dim_128_cosine_accuracy@5": 0.6872037914691943,
261
+ "eval_dim_128_cosine_map@100": 0.4195949550112758,
262
+ "eval_dim_128_cosine_mrr@10": 0.4129203340103815,
263
+ "eval_dim_128_cosine_ndcg@10": 0.5057041685567575,
264
+ "eval_dim_128_cosine_precision@1": 0.22748815165876776,
265
+ "eval_dim_128_cosine_precision@10": 0.07962085308056871,
266
+ "eval_dim_128_cosine_precision@3": 0.18641390205371247,
267
+ "eval_dim_128_cosine_precision@5": 0.13744075829383884,
268
+ "eval_dim_128_cosine_recall@1": 0.22748815165876776,
269
+ "eval_dim_128_cosine_recall@10": 0.7962085308056872,
270
+ "eval_dim_128_cosine_recall@3": 0.5592417061611374,
271
+ "eval_dim_128_cosine_recall@5": 0.6872037914691943,
272
+ "eval_runtime": 3.9949,
273
+ "eval_samples_per_second": 0.0,
274
+ "eval_sequential_score": 0.5057041685567575,
275
+ "eval_steps_per_second": 0.0,
276
+ "step": 22
277
+ }
278
+ ],
279
+ "logging_steps": 10,
280
+ "max_steps": 30,
281
+ "num_input_tokens_seen": 0,
282
+ "num_train_epochs": 30,
283
+ "save_steps": 500,
284
+ "stateful_callbacks": {
285
+ "TrainerControl": {
286
+ "args": {
287
+ "should_epoch_stop": false,
288
+ "should_evaluate": false,
289
+ "should_log": false,
290
+ "should_save": true,
291
+ "should_training_stop": false
292
+ },
293
+ "attributes": {}
294
+ }
295
+ },
296
+ "total_flos": 0.0,
297
+ "train_batch_size": 32,
298
+ "trial_name": null,
299
+ "trial_params": null
300
+ }
checkpoint-22/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65cb4f201c5ee4acd9ba4182f64fbc2110c157cd08a8321a9d97a0f435fd9b49
3
+ size 5688
checkpoint-22/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.47.1",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.1",
5
+ "pytorch": "2.5.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d4db737f56aaea90796b5a8d219de0eee958295a575c611f6b417ad340151da
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
runs/Jan02_09-18-42_1afe4c3f2790/events.out.tfevents.1735809557.1afe4c3f2790.457.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95cfabd32461143d8d12e6cfab9dd0b0c8cce76ea989a645d70b9e24063d0d75
3
+ size 88
runs/Jan02_09-20-23_1afe4c3f2790/events.out.tfevents.1735809626.1afe4c3f2790.1755.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:827c2b964e706b816ded4b47a59995e51aa478b13f09784597eecb941fcb9d8f
3
+ size 88
runs/Jan02_09-20-23_1afe4c3f2790/events.out.tfevents.1735809855.1afe4c3f2790.1755.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbc824104ec977a47d56b27b65902f497f68be305a9d0108f5b6a5bb136af304
3
+ size 6151
runs/Jan02_09-25-33_1afe4c3f2790/events.out.tfevents.1735809939.1afe4c3f2790.1755.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b23f65ca656bdfcfd50c5c2f151d4976998373459907d692851025bd9bf92c03
3
+ size 88
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9496f9b248af5ef52e94cb16dc40df89fabece2b037adeefc18c12b1e961e451
3
+ size 5688
vocab.txt ADDED
The diff for this file is too large to render. See raw diff