am-azadi's picture
Upload folder using huggingface_hub
0eaee80 verified
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:25743
- loss:MultipleNegativesRankingLoss
base_model: am-azadi/UAE-Large-V1_Fine_Tuned
widget:
- source_sentence: 'The good news: That was it with the vaccination terror Every compulsory
vaccination is now illegal from the outset. The Council of Europe (not to be confused
with the EU), to which all European states belong with the exception of Belarus,
Kosovo and the Vatican and which is the body responsible for the European Court
of Human Rights, passed a resolution on January 27th, 2021 in its resolution 2361/2021
that no one may be vaccinated against their will, under pressure. The 47 member
states are asked to point out before vaccination that vaccination is not compulsory
and that the non-vaccinated person must not be discriminated against. Discrimination
is expressly prohibited even if there are health risks or if someone does not
want to be vaccinated. Vaccine manufacturers are encouraged to publish all information
on the safety of the vaccines. With this resolution, the most important human
rights organization in Europe has now set standards and obligations, as well as
created guidelines under international law that are to be applied by the 47 member
states, including the EU as an organization. Discrimination in the workplace,
for example, or a ban on travel for the unvaccinated are thus legally excluded.
You can now invoke it in every court case, before every authority, every employer,
every tour operator, every home manager, etc.'
sentences:
- Mike Tyson prays in coffee shop which has “dogs and Muslims are not allowed” sign
- The Council of Europe has declared mandatory corona vaccination illegal
- CDC chief admits most Covid deaths were people with comorbidities
- source_sentence: ELPunctual The Punctual 24H More Madrid intends close the Pandemic
Hospital Isabel Zendal if she wins the elections. 18:30 Mar 22 21 Twitter WebAppFirst
they complain that there were no beds and now they are willing to remove them...
Who votes for these crazy people?
sentences:
- Video shows Bolsonarista demonstration held in Rio de Janeiro on May 1, 2022
- US missionary woman undergoes FGM in Kenya
- More Madrid intends to close the Isabel Zendal Hospital if it wins the elections
- source_sentence: LAST DAM BUILT IN AUSTRALIA 1984 POPULATION OF AUSTRALIA 1984 15.5
MILLION POPULATION OF AUSTRALIA 2018 25 MILLION AND WE WONDER WHY WE HAVE NO WATERThink
about it...
sentences:
- The last dam in Australia was built in 1984
- Demonstrators storm the White House
- These ants contain the coronavirus in their body
- source_sentence: Quaranta QUARANTINE NCR TV ENHANCED COMMUNITY QUARANTINE EXTENDED
Until May 30, 2020 BULACAN -NEW ECIJA TIMPAMPANGA TARLAC - ZAMBALES CALABARZON
BENGUET PANGASINAN BATAAN AUNTIES ILOILO CEBU CEBU CITY - COME ON RAN FROM THE
CLOUD DAVAO CI ORIENTAL MIND ALBAY CATANDUANE AntiqueECQ extended until may 30
2020
sentences:
- Genuine news report about extension of COVID-19 lockdown in the Philippines
- China built 1,000 beds in 10 days. Brazilian army builds 2,000 in 48 hours.
- Holding your breath for 10 seconds allows you to identify contamination by the
new coronavirus
- source_sentence: 'Look what a show Pope Francis gave in yesterday''s homily / sermon!
It''s to be read and reread over and over again... This is the most spiritual
Pope since Peter. "You may have flaws, be anxious, and sometimes live irritated,
but don''t forget that your life is the greatest company in the world. Only you
can prevent it from going into decline. Many appreciate you, admire you and love
you. remember that being happy is not having a sky without storms, a road without
accidents, work without fatigue, relationships without disappointments. Being
happy is finding strength in forgiveness, hope in battles, security in the stage
of fear, love in discord. It''s not just appreciating the smile, but also reflecting
on sadness. It''s not just celebrating successes, but learning lessons from failures.
It''s not just feeling happy with applause, but being happy in anonymity. Being
happy is recognizing that it''s worth life is worth living, despite all the challenges,
misunderstandings, periods of crisis. Being happy is not a fatality of fate, but
an achievement for those who manage to travel within themselves. To be happy is
to stop feeling like a victim of problems and become the author of his own story
. It''s crossing deserts outside of yourself, but managing to find an oasis in
the depths of our soul. It is to thank God for each morning, for the miracle of
life. Being happy is not being afraid of your own feelings. It''s knowing how
to talk about yourself. It''s having the courage to hear a "no". It''s feeling
safe when receiving criticism, even if unfair. It''s kissing the children, pampering
the parents, living poetic moments with friends, even when they hurt us. To be
happy is to let the creature that lives in each of us live, free, joyful and simple.
It''s having maturity to be able to say: "I was wrong". It''s having the courage
to say, "I''m sorry". It''s having the sensitivity to say: "I need you". It''s
having the ability to say, "I love you". May your life become a garden of opportunities
to be happy... May your springtime be a lover of joy. May in your winters be a
lover of wisdom. And that when you make a mistake, start over from the beginning.
For only then will you be in love with life. You will discover that being happy
is not having a perfect life. But using tears to irrigate tolerance. Use losses
to train patience. Using mistakes to sculpt serenity. Using pain to cut pleasure.
Use obstacles to open intelligence windows. Never give up....Never give up on
the people who love you. Never give up happiness, for life is an incredible spectacle."
(Pope Francis).'
sentences:
- '"The message that Pope Francis gave in yesterday''s homily/sermon! It is to be
read and reread several times... What an admirable man!"'
- Denmark allows Muslim women to wear the niqab
- Trump author of speech disparaging Africans and Arabs
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on am-azadi/UAE-Large-V1_Fine_Tuned
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [am-azadi/UAE-Large-V1_Fine_Tuned](https://huggingface.co/am-azadi/UAE-Large-V1_Fine_Tuned). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [am-azadi/UAE-Large-V1_Fine_Tuned](https://huggingface.co/am-azadi/UAE-Large-V1_Fine_Tuned) <!-- at revision 8ea63aea71614563429bfd78c11b294cb6c1b3e5 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Look what a show Pope Francis gave in yesterday\'s homily / sermon! It\'s to be read and reread over and over again... This is the most spiritual Pope since Peter. "You may have flaws, be anxious, and sometimes live irritated, but don\'t forget that your life is the greatest company in the world. Only you can prevent it from going into decline. Many appreciate you, admire you and love you. remember that being happy is not having a sky without storms, a road without accidents, work without fatigue, relationships without disappointments. Being happy is finding strength in forgiveness, hope in battles, security in the stage of fear, love in discord. It\'s not just appreciating the smile, but also reflecting on sadness. It\'s not just celebrating successes, but learning lessons from failures. It\'s not just feeling happy with applause, but being happy in anonymity. Being happy is recognizing that it\'s worth life is worth living, despite all the challenges, misunderstandings, periods of crisis. Being happy is not a fatality of fate, but an achievement for those who manage to travel within themselves. To be happy is to stop feeling like a victim of problems and become the author of his own story . It\'s crossing deserts outside of yourself, but managing to find an oasis in the depths of our soul. It is to thank God for each morning, for the miracle of life. Being happy is not being afraid of your own feelings. It\'s knowing how to talk about yourself. It\'s having the courage to hear a "no". It\'s feeling safe when receiving criticism, even if unfair. It\'s kissing the children, pampering the parents, living poetic moments with friends, even when they hurt us. To be happy is to let the creature that lives in each of us live, free, joyful and simple. It\'s having maturity to be able to say: "I was wrong". It\'s having the courage to say, "I\'m sorry". It\'s having the sensitivity to say: "I need you". It\'s having the ability to say, "I love you". May your life become a garden of opportunities to be happy... May your springtime be a lover of joy. May in your winters be a lover of wisdom. And that when you make a mistake, start over from the beginning. For only then will you be in love with life. You will discover that being happy is not having a perfect life. But using tears to irrigate tolerance. Use losses to train patience. Using mistakes to sculpt serenity. Using pain to cut pleasure. Use obstacles to open intelligence windows. Never give up....Never give up on the people who love you. Never give up happiness, for life is an incredible spectacle." (Pope Francis).',
'"The message that Pope Francis gave in yesterday\'s homily/sermon! It is to be read and reread several times... What an admirable man!"',
'Denmark allows Muslim women to wear the niqab',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 25,743 training samples
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 | label |
|:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:--------------------------------------------------------------|
| type | string | string | float |
| details | <ul><li>min: 6 tokens</li><li>mean: 113.55 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 17.89 tokens</li><li>max: 126 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> |
* Samples:
| sentence_0 | sentence_1 | label |
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------|:-----------------|
| <code>de. has left the route with his the ultimate nazi prey dogs. With his patches with swastikas of Tito Adolfito Hitler and those little things. But don't tell them fucking Nazis, they'll be offended later 180 * Rov</code> | <code>Santiago Abascal posed next to a man wearing Nazi emblems at a biker rally in Valladolid</code> | <code>1.0</code> |
| <code>the info is ... a Danish police officer told a woman who was wearing a veil that the parliament had decided to approve the use of the niqab for Muslim women in Denmark. The situation is reversed in Indonesia, which (he said) has a Muslim majority population. Aya naon with the country ieu?</code> | <code>Denmark allows Muslim women to wear the niqab</code> | <code>1.0</code> |
| <code>In Kolwezi, a Congolese driver destroys Chinese trucks out of anger</code> | <code>A Congolese destroys the trucks of a Chinese company in the DRC</code> | <code>1.0</code> |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 2
- `per_device_eval_batch_size`: 2
- `num_train_epochs`: 1
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 2
- `per_device_eval_batch_size`: 2
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin
</details>
### Training Logs
| Epoch | Step | Training Loss |
|:------:|:-----:|:-------------:|
| 0.0388 | 500 | 0.0173 |
| 0.0777 | 1000 | 0.0124 |
| 0.1165 | 1500 | 0.0127 |
| 0.1554 | 2000 | 0.0256 |
| 0.1942 | 2500 | 0.0123 |
| 0.2331 | 3000 | 0.0199 |
| 0.2719 | 3500 | 0.0079 |
| 0.3108 | 4000 | 0.0134 |
| 0.3496 | 4500 | 0.0127 |
| 0.3884 | 5000 | 0.026 |
| 0.4273 | 5500 | 0.0314 |
| 0.4661 | 6000 | 0.0267 |
| 0.5050 | 6500 | 0.0145 |
| 0.5438 | 7000 | 0.0093 |
| 0.5827 | 7500 | 0.007 |
| 0.6215 | 8000 | 0.0071 |
| 0.6603 | 8500 | 0.0116 |
| 0.6992 | 9000 | 0.0085 |
| 0.7380 | 9500 | 0.0157 |
| 0.7769 | 10000 | 0.0051 |
| 0.8157 | 10500 | 0.0101 |
| 0.8546 | 11000 | 0.0174 |
| 0.8934 | 11500 | 0.0116 |
| 0.9323 | 12000 | 0.0073 |
| 0.9711 | 12500 | 0.0146 |
### Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.1
- Tokenizers: 0.21.0
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->