File size: 22,211 Bytes
0eaee80 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 |
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:25743
- loss:MultipleNegativesRankingLoss
base_model: am-azadi/UAE-Large-V1_Fine_Tuned
widget:
- source_sentence: 'The good news: That was it with the vaccination terror Every compulsory
vaccination is now illegal from the outset. The Council of Europe (not to be confused
with the EU), to which all European states belong with the exception of Belarus,
Kosovo and the Vatican and which is the body responsible for the European Court
of Human Rights, passed a resolution on January 27th, 2021 in its resolution 2361/2021
that no one may be vaccinated against their will, under pressure. The 47 member
states are asked to point out before vaccination that vaccination is not compulsory
and that the non-vaccinated person must not be discriminated against. Discrimination
is expressly prohibited even if there are health risks or if someone does not
want to be vaccinated. Vaccine manufacturers are encouraged to publish all information
on the safety of the vaccines. With this resolution, the most important human
rights organization in Europe has now set standards and obligations, as well as
created guidelines under international law that are to be applied by the 47 member
states, including the EU as an organization. Discrimination in the workplace,
for example, or a ban on travel for the unvaccinated are thus legally excluded.
You can now invoke it in every court case, before every authority, every employer,
every tour operator, every home manager, etc.'
sentences:
- Mike Tyson prays in coffee shop which has “dogs and Muslims are not allowed” sign
- The Council of Europe has declared mandatory corona vaccination illegal
- CDC chief admits most Covid deaths were people with comorbidities
- source_sentence: ELPunctual The Punctual 24H ⠀ More Madrid intends close the Pandemic
Hospital Isabel Zendal if she wins the elections. 18:30 Mar 22 21 Twitter WebAppFirst
they complain that there were no beds and now they are willing to remove them...
Who votes for these crazy people?
sentences:
- Video shows Bolsonarista demonstration held in Rio de Janeiro on May 1, 2022
- US missionary woman undergoes FGM in Kenya
- More Madrid intends to close the Isabel Zendal Hospital if it wins the elections
- source_sentence: LAST DAM BUILT IN AUSTRALIA 1984 POPULATION OF AUSTRALIA 1984 15.5
MILLION POPULATION OF AUSTRALIA 2018 25 MILLION AND WE WONDER WHY WE HAVE NO WATERThink
about it...
sentences:
- The last dam in Australia was built in 1984
- Demonstrators storm the White House
- These ants contain the coronavirus in their body
- source_sentence: Quaranta QUARANTINE NCR TV ENHANCED COMMUNITY QUARANTINE EXTENDED
Until May 30, 2020 BULACAN -NEW ECIJA TIMPAMPANGA TARLAC - ZAMBALES CALABARZON
BENGUET PANGASINAN BATAAN AUNTIES ILOILO CEBU CEBU CITY - COME ON RAN FROM THE
CLOUD DAVAO CI ORIENTAL MIND ALBAY CATANDUANE AntiqueECQ extended until may 30
2020
sentences:
- Genuine news report about extension of COVID-19 lockdown in the Philippines
- China built 1,000 beds in 10 days. Brazilian army builds 2,000 in 48 hours.
- Holding your breath for 10 seconds allows you to identify contamination by the
new coronavirus
- source_sentence: 'Look what a show Pope Francis gave in yesterday''s homily / sermon!
It''s to be read and reread over and over again... This is the most spiritual
Pope since Peter. "You may have flaws, be anxious, and sometimes live irritated,
but don''t forget that your life is the greatest company in the world. Only you
can prevent it from going into decline. Many appreciate you, admire you and love
you. remember that being happy is not having a sky without storms, a road without
accidents, work without fatigue, relationships without disappointments. Being
happy is finding strength in forgiveness, hope in battles, security in the stage
of fear, love in discord. It''s not just appreciating the smile, but also reflecting
on sadness. It''s not just celebrating successes, but learning lessons from failures.
It''s not just feeling happy with applause, but being happy in anonymity. Being
happy is recognizing that it''s worth life is worth living, despite all the challenges,
misunderstandings, periods of crisis. Being happy is not a fatality of fate, but
an achievement for those who manage to travel within themselves. To be happy is
to stop feeling like a victim of problems and become the author of his own story
. It''s crossing deserts outside of yourself, but managing to find an oasis in
the depths of our soul. It is to thank God for each morning, for the miracle of
life. Being happy is not being afraid of your own feelings. It''s knowing how
to talk about yourself. It''s having the courage to hear a "no". It''s feeling
safe when receiving criticism, even if unfair. It''s kissing the children, pampering
the parents, living poetic moments with friends, even when they hurt us. To be
happy is to let the creature that lives in each of us live, free, joyful and simple.
It''s having maturity to be able to say: "I was wrong". It''s having the courage
to say, "I''m sorry". It''s having the sensitivity to say: "I need you". It''s
having the ability to say, "I love you". May your life become a garden of opportunities
to be happy... May your springtime be a lover of joy. May in your winters be a
lover of wisdom. And that when you make a mistake, start over from the beginning.
For only then will you be in love with life. You will discover that being happy
is not having a perfect life. But using tears to irrigate tolerance. Use losses
to train patience. Using mistakes to sculpt serenity. Using pain to cut pleasure.
Use obstacles to open intelligence windows. Never give up....Never give up on
the people who love you. Never give up happiness, for life is an incredible spectacle."
(Pope Francis).'
sentences:
- '"The message that Pope Francis gave in yesterday''s homily/sermon! It is to be
read and reread several times... What an admirable man!"'
- Denmark allows Muslim women to wear the niqab
- Trump author of speech disparaging Africans and Arabs
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on am-azadi/UAE-Large-V1_Fine_Tuned
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [am-azadi/UAE-Large-V1_Fine_Tuned](https://huggingface.co/am-azadi/UAE-Large-V1_Fine_Tuned). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [am-azadi/UAE-Large-V1_Fine_Tuned](https://huggingface.co/am-azadi/UAE-Large-V1_Fine_Tuned) <!-- at revision 8ea63aea71614563429bfd78c11b294cb6c1b3e5 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Look what a show Pope Francis gave in yesterday\'s homily / sermon! It\'s to be read and reread over and over again... This is the most spiritual Pope since Peter. "You may have flaws, be anxious, and sometimes live irritated, but don\'t forget that your life is the greatest company in the world. Only you can prevent it from going into decline. Many appreciate you, admire you and love you. remember that being happy is not having a sky without storms, a road without accidents, work without fatigue, relationships without disappointments. Being happy is finding strength in forgiveness, hope in battles, security in the stage of fear, love in discord. It\'s not just appreciating the smile, but also reflecting on sadness. It\'s not just celebrating successes, but learning lessons from failures. It\'s not just feeling happy with applause, but being happy in anonymity. Being happy is recognizing that it\'s worth life is worth living, despite all the challenges, misunderstandings, periods of crisis. Being happy is not a fatality of fate, but an achievement for those who manage to travel within themselves. To be happy is to stop feeling like a victim of problems and become the author of his own story . It\'s crossing deserts outside of yourself, but managing to find an oasis in the depths of our soul. It is to thank God for each morning, for the miracle of life. Being happy is not being afraid of your own feelings. It\'s knowing how to talk about yourself. It\'s having the courage to hear a "no". It\'s feeling safe when receiving criticism, even if unfair. It\'s kissing the children, pampering the parents, living poetic moments with friends, even when they hurt us. To be happy is to let the creature that lives in each of us live, free, joyful and simple. It\'s having maturity to be able to say: "I was wrong". It\'s having the courage to say, "I\'m sorry". It\'s having the sensitivity to say: "I need you". It\'s having the ability to say, "I love you". May your life become a garden of opportunities to be happy... May your springtime be a lover of joy. May in your winters be a lover of wisdom. And that when you make a mistake, start over from the beginning. For only then will you be in love with life. You will discover that being happy is not having a perfect life. But using tears to irrigate tolerance. Use losses to train patience. Using mistakes to sculpt serenity. Using pain to cut pleasure. Use obstacles to open intelligence windows. Never give up....Never give up on the people who love you. Never give up happiness, for life is an incredible spectacle." (Pope Francis).',
'"The message that Pope Francis gave in yesterday\'s homily/sermon! It is to be read and reread several times... What an admirable man!"',
'Denmark allows Muslim women to wear the niqab',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 25,743 training samples
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 | label |
|:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:--------------------------------------------------------------|
| type | string | string | float |
| details | <ul><li>min: 6 tokens</li><li>mean: 113.55 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 17.89 tokens</li><li>max: 126 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> |
* Samples:
| sentence_0 | sentence_1 | label |
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------|:-----------------|
| <code>de. has left the route with his the ultimate nazi prey dogs. With his patches with swastikas of Tito Adolfito Hitler and those little things. But don't tell them fucking Nazis, they'll be offended later 180 * Rov</code> | <code>Santiago Abascal posed next to a man wearing Nazi emblems at a biker rally in Valladolid</code> | <code>1.0</code> |
| <code>the info is ... a Danish police officer told a woman who was wearing a veil that the parliament had decided to approve the use of the niqab for Muslim women in Denmark. The situation is reversed in Indonesia, which (he said) has a Muslim majority population. Aya naon with the country ieu?</code> | <code>Denmark allows Muslim women to wear the niqab</code> | <code>1.0</code> |
| <code>In Kolwezi, a Congolese driver destroys Chinese trucks out of anger</code> | <code>A Congolese destroys the trucks of a Chinese company in the DRC</code> | <code>1.0</code> |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 2
- `per_device_eval_batch_size`: 2
- `num_train_epochs`: 1
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 2
- `per_device_eval_batch_size`: 2
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin
</details>
### Training Logs
| Epoch | Step | Training Loss |
|:------:|:-----:|:-------------:|
| 0.0388 | 500 | 0.0173 |
| 0.0777 | 1000 | 0.0124 |
| 0.1165 | 1500 | 0.0127 |
| 0.1554 | 2000 | 0.0256 |
| 0.1942 | 2500 | 0.0123 |
| 0.2331 | 3000 | 0.0199 |
| 0.2719 | 3500 | 0.0079 |
| 0.3108 | 4000 | 0.0134 |
| 0.3496 | 4500 | 0.0127 |
| 0.3884 | 5000 | 0.026 |
| 0.4273 | 5500 | 0.0314 |
| 0.4661 | 6000 | 0.0267 |
| 0.5050 | 6500 | 0.0145 |
| 0.5438 | 7000 | 0.0093 |
| 0.5827 | 7500 | 0.007 |
| 0.6215 | 8000 | 0.0071 |
| 0.6603 | 8500 | 0.0116 |
| 0.6992 | 9000 | 0.0085 |
| 0.7380 | 9500 | 0.0157 |
| 0.7769 | 10000 | 0.0051 |
| 0.8157 | 10500 | 0.0101 |
| 0.8546 | 11000 | 0.0174 |
| 0.8934 | 11500 | 0.0116 |
| 0.9323 | 12000 | 0.0073 |
| 0.9711 | 12500 | 0.0146 |
### Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.1
- Tokenizers: 0.21.0
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--> |