metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:69500
- loss:Infonce
base_model: Snowflake/snowflake-arctic-embed-l-v2.0
widget:
- source_sentence: What aspect of human relationship to nature is omitted from the text
sentences:
- >-
There are a few good ones, though. Here are the best WWE apps and WWE
games for Android! The first five are the best games...
Go Android Apps (blog)
The Best Themes for Android Free Download: Hi friend we are again back
with our new top ten best free themes for android list. This article is
especially dedicated for those persons who want to make their
smartphone...
Paragon Software has created an app for Android that allows your device
to natively read partitions in file systems that Android normally can't
handle, such as Microsoft's NTFS, allowing immediate and easy use of...
While the Sentio Desktop app can be used on its own, it was primarily
meant to complement Sentio's Superbook, a crowdfunded laptop shell for
Android smartphones and tablets that's just entering production after...
... phone then GBWhatsapp is the app for you. GBWhatsapp is basically
similar to Whatsapp+ in terms of features. The newest available version
right now is GBWhatsapp 6.40 APK for Android devices.
- >-
A true entertainer. date city state venue 11/23/2012 West Palm Beach FL
Kravis Center 11/24/2012 Sarasota FL Van Wezel Performing Arts Hall
11/25/2012 Clearwater FL Capitol Theatre 11/29/2012 Durham NC Durham
Performing Arts Center 12/1/2012 Atlantic City NJ Trump Taj Mahal
12/2/2012 Staten Island NY St. George Theatre 12/4/2012 Bethlehem PA
Musikfest Cafe 12/5/2012 Verona NY Turning Stone Casino 12/6/2012
Stamford CT Palace Theatre Stamford 12/8/2012 Shippensburg PA Luhrs
Center 12/9/2012 Boston MA Wilbur Theatre 12/11/2012 Greensburg PA The
Palace Theatre 12/12/2012 Easton MD Avalon Theatre 12/15/2012 Saint
Charles IL Arcada Theater 12/16/2012 Milwaukee WI Potawatomi Bingo
Casino 12/18/2012 Beaver Creek CO Vilar Performing Arts Center
12/20/2012 Chandler AZ Ovations Live!
- >-
The reader will gain a better understanding of the direction nature and
culture is heading today by learning how connections were made in the
past. It omits that which Raymond Williams called "a working landscape"
-- the most intimate human relationship to nature which is people who
live and work on it.
- source_sentence: >-
Why is it recommended to contact a wedding agency or consultant before
making a decision
sentences:
- >-
Perhaps owing to this humiliation I resigned as Chief Winery Warlord,
and took a position elsewhere. Following my resignation, we rebooked our
date with axe throwing destiny, and converted the night from a team
building exercise to a majestic send off in honour of my 10ish glorious
years at Coffin Ridge. We arrived in our most impeccable vestments.
- >-
Therefore, those private companies increased their own rate of cash burn
since the financial markets were willing to fund money-losing
enterprises without hesitation. Out of the 100 largest North
American-based technology companies, 16 have lost money over the past
year.
- >-
Yet , it is best to contact a wedding agency or consultant before you
make your concluding decision. This will make certain you are dealing
with a respectable company.
- source_sentence: >-
What is the Electronic Music Education and Preservation Project (EMEAPP)
and what are its functions
sentences:
- >-
The Electronic Music Education and Preservation Project (EMEAPP) is the
steward of a privately held world-class curated collection of rare
vintage electronic instruments and stage-used gear. This includes
effects units, amps, organs, synthesizers, electro-mechanical
instruments, guitars, prototypes, vintage audio/video media and analog
studio gear. In addition, EMEAPP itself is cultivating its own humble
collection. It is our charge to cultivate and reap excellent knowledge
from these unique resources and return it to our members and the world.
We do this as a learning center, through research projects, creative
endeavors, media programming and tours, enlightening many people along
the way. There is so much to be harvested from history; EMEAPP has a key
to the vault. EMEAPP is a private museum, a critical learning center and
a multi-media production studio nicely packed into a brick-and-mortar
facility outside of Philadelphia, Pennsylvania. EMEAPP is a 501(c)(3)
non-profit organization.
- You got a problem? Yo, she'll splode it.
- >-
I love sex; I think sex is completely absurdly demonized in our culture.
But in the end, however much sex you want to have, with however many
people in how many ways, to be loved and to love is what human beings
really want.
- source_sentence: What year did the Duchess die and where did it happen
sentences:
- >-
League One
League table
Results summary
Results by matchday
Matches
On 21 June 2018, the League One fixtures for the forthcoming season were
announced. FA Cup
The first round draw was made live on BBC by Dennis Wise and Dion Dublin
on 22 October.
- >-
The Duchess was widowed in 2007 and died in London in 2011. Issue
The Duke and Duchess of Buccleuch and Queensberry had four children:
Richard Scott, 10th Duke of Buccleuch (b. 1954), married Lady Elizabeth
Kerr, daughter of the Marquess of Lothian, and has issue two sons and
two daughters. Lord John (born 9 August 1957), married Berrin Torolsan,
and lives in Istanbul, Turkey. Lady Charlotte-Anne (born 9 January
1966), married Count Bernard de Castellane in 1991, and has issue two
sons and a daughter. Lord Damian (born 8 October 1969), married
Elizabeth Powis, and has issue. External links
Jane in her wedding dress
Movie clip of Jane's wedding
References
1929 births
2011 deaths
British duchesses by marriage
Jane
Scottish female models
British cookbook writers
Women cookbook writers
- >-
Is this common, do other people with epilepsy have dangerously low
appetites? So we left there and stopped and got her a bite to eat.
- source_sentence: Why is it important to keep moving over the summer
sentences:
- It's important to keep moving over the summer!
- >-
2008. CHENG HF, LEE YM, Chu CH, Leung WK & Mok TMY. - Journal Editor
(Hong Kong Medical Journal) 2008
- Editor-in-Chief (Hong Kong Dental Journal) 2007
- Editor-in-Chief (Hong Kong Dental Journal) 2006
- Deputy Editor (Hong Kong Dental Journal) 2004
- >-
Both demand collective action and shared resources. While one is
distinctly egalitarian and the other hierarchical in nature, both speak
of sublimating private goals for the achievement of larger, shared ones.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on Snowflake/snowflake-arctic-embed-l-v2.0
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l-v2.0. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Snowflake/snowflake-arctic-embed-l-v2.0
- Maximum Sequence Length: 1024 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Jrinky/snowflake")
# Run inference
sentences = [
'Why is it important to keep moving over the summer',
"It's important to keep moving over the summer!",
'2008. CHENG HF, LEE YM, Chu CH, Leung WK & Mok TMY. - Journal Editor (Hong Kong Medical Journal) 2008\n- Editor-in-Chief (Hong Kong Dental Journal) 2007\n- Editor-in-Chief (Hong Kong Dental Journal) 2006\n- Deputy Editor (Hong Kong Dental Journal) 2004',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 69,500 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 6 tokens
- mean: 17.47 tokens
- max: 44 tokens
- min: 3 tokens
- mean: 113.33 tokens
- max: 1024 tokens
- Samples:
anchor positive What might have been unnecessary if better emergency plans had been implemented
If better emergency plans had been in place, maybe chemical dipersants wouldn't be needed. And on and on.
What was the year of publication for the 3rd Edition of 'Regular Polytopes' by H.S.M. Coxeter
Coxeter, Regular Polytopes, 3rd Edition, Dover New York, 1973
Kaleidoscopes: Selected Writings of H.S.M. Coxeter, edited by F. Arthur Sherk, Peter McMullen, Anthony C. Thompson, Asia Ivic Weiss, Wiley-Interscience Publication, 1995,
(Paper 22) H.S.M.Who is the author of the GURPS Shapeshifters supplement
GURPS Shapeshifters () is a supplement by Robert M. Schroeck for the GURPS role-playing game system, third edition.
- Loss:
selfloss.Infonce
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 17,376 evaluation samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 6 tokens
- mean: 16.87 tokens
- max: 45 tokens
- min: 6 tokens
- mean: 115.36 tokens
- max: 1024 tokens
- Samples:
anchor positive What impressive achievements did the Warriors accomplish during their last season in Division III
The Warriors were among the most lethal offensive teams in Division III this past year, posting a team batting average of .344 and averaging nearly seven runs per game, smacking 29 home runs, and collecting nearly 600 total bases. They shared the Little East Conference regular-season championship and later knocked off the top seed in the NCAA regional tournament (Montclair State) en route to their winningest season in 14 years.
How many bars had nectar and capped honey on them
Eight of the bars had nectar and capped honey on them. There are eighteen bars with brood in some form on them and a mix of workers and drones.
What idea is being requested regarding the 'triangle'
Next up...the "triangle". Please, seriously, if anyone could float me an idea, I would really appreciate it.
- Loss:
selfloss.Infonce
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 3per_device_eval_batch_size
: 3learning_rate
: 5e-06num_train_epochs
: 5warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 3per_device_eval_batch_size
: 3per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-06weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Truedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.0777 | 150 | 0.0257 | 0.0134 |
0.1554 | 300 | 0.0136 | 0.0082 |
0.2332 | 450 | 0.0079 | 0.0062 |
0.3109 | 600 | 0.0065 | 0.0051 |
0.3886 | 750 | 0.0059 | 0.0045 |
0.4663 | 900 | 0.0057 | 0.0040 |
0.5440 | 1050 | 0.0064 | 0.0037 |
0.6218 | 1200 | 0.005 | 0.0034 |
0.6995 | 1350 | 0.0052 | 0.0034 |
0.7772 | 1500 | 0.0041 | 0.0032 |
Framework Versions
- Python: 3.12.3
- Sentence Transformers: 3.2.0
- Transformers: 4.44.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.3.0
- Datasets: 2.19.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Infonce
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}