SentenceTransformer

This is a sentence-transformers model trained on the parquet dataset. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 512 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • parquet

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pankajrajdeo/Bioformer-8L-UMLS-Pubmed-TCE-Epoch-1")
# Run inference
sentences = [
    'CLINICAL NOTE: CEREBELLAR COGNITIVE AFFECTIVE SYNDROME IMPROVEMENT WITH SELECTIVE INHIBITOR OF SEROTONIN RECAPTATION.',
    'Cerebellar cognitive affective syndrome (CCAS) is characterized by alterations at the cognitive level (dysexecutive syndrome, visuospatial deficit, language...), associated with affective / emotional changes. Its pathophysiology is not well known and there is currently no specific treatment. We describe a 64-year-old man with a rare condition of cognitive-behavioral disorder after an infarction in the left middle cerebral artery, dominated by executive dysfunctions, predominantly oral apraxia, interrupted divided attention, disturbed visuospatial organization and affective abnormalities with great apathy, and whose symptoms improved with a selective serotonin reuptake inhibitor (SSRI). In absence of cerebellar structural damage, a perfusion brain single photon emission computed tomography using 99mTc- hexamethyl-propylene-aminoxime (SPECT-HMPAO) showed left frontotemporal and parietoccipital hypoperfusion of known vascular etiology, and hypoperfusion in the right cerebellar hemisphere compatible with the phenomenon of crossed diaschisis. We hypothesize that cognitive and affective deficits are aggravated by the functional disruption of the reciprocal cerebellar interconnections with areas of cerebral association and paralimbic cortex, altering the contribution of the cerebellum to cognitive and affective processing and modulation. In the case described, both the clinical situation and the functional control images improved after treatment with SSRI, which increases the possibility that there is connectivity of some serotonergic transmission projections between cerebellum and contralateral association cortices, and that said connectivity dysfunctional is involved in the pathophysiology of CCAS.',
    'Precision radiotherapy is a critical and indispensable cancer treatment means in the modern clinical workflow with the goal of achieving "quality-up and cost-down" in patient care. The challenge of this therapy lies in developing computerized clinical-assistant solutions with precision, automation, and reproducibility built-in to deliver it at scale. In this work, we provide a comprehensive yet ongoing, incomplete survey of and discussions on the recent progress of utilizing advanced deep learning, semantic organ parsing, multimodal imaging fusion, neural architecture search and medical image analytical techniques to address four corner-stone problems or sub-problems required by all precision radiotherapy workflows, namely, organs at risk (OARs) segmentation, gross tumor volume (GTV) segmentation, metastasized lymph node (LN) detection, and clinical tumor volume (CTV) segmentation. Without loss of generality, we mainly focus on using esophageal and head-and-neck cancers as examples, but the methods can be extrapolated to other types of cancers. High-precision, automated and highly reproducible OAR/GTV/LN/CTV auto-delineation techniques have demonstrated their effectiveness in reducing the inter-practitioner variabilities and the time cost to permit rapid treatment planning and adaptive replanning for the benefit of patients. Through the presentation of the achievements and limitations of these techniques in this review, we hope to encourage more collective multidisciplinary precision radiotherapy workflows to transpire.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 512]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

parquet

  • Dataset: parquet
  • Size: 27,719,606 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 9 tokens
    • mean: 38.85 tokens
    • max: 130 tokens
    • min: 24 tokens
    • mean: 262.32 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    ADDRESS OF COL. GARRICK MALLERY, U. S. ARMY. It may be conceded that after man had all his present faculties, he did not choose between the adoption of voice and gesture, and never with those faculties, was in a state where the one was used, to the absolute exclusion of the other. The epoch, however, to which our speculations relate is that in which he had not reached the present symmetric development of his intellect and of his bodily organs, and the inquiry is: Which mode of communication was earliest adopted to his single wants and informed intelligence? With the voice he could imitate distinictively but few sounds of nature, while with gesture he could exhibit actions, motions, positions, forms, dimensions, directions and distances, with their derivations and analogues. It would seem from this unequal division of capacity that oral speech remained rudimentary long after gesture had become an efficient mode of communication. With due allowance for all purely imitative sounds, and for the spontaneous action of vocal organs unde...
    How TO OBTAIN THE BRAIN OF THE CAT. How to obtain the Brain of the Cat, (Wilder).-Correction: Page 158, second column, line 7, "grains," should be "grams;" page 159, near middle of 2nd column, "successily," should be "successively;" page 161, the number of Flower's paper is 3.
    DOLBEAR ON THE NATURE AND CONSTITUTION OF MATTER. Mr. Dopp desires to make the following correction in his paper in the last issue: "In my article on page 200 of "Science", the expression and should have been and being the velocity of light.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

parquet

  • Dataset: parquet
  • Size: 27,719,606 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 8 tokens
    • mean: 23.18 tokens
    • max: 55 tokens
    • min: 6 tokens
    • mean: 263.22 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    Machine learning makes magnificent macromolecules for medicine. At the University of Minnesota, scientists explore the application of machine learning to screen a multiparametric library of polymers to investigate the relationship between polymer attributes, payload type, and biological outcomes to optimize polymeric vector development for delivery of nucleic acid payloads.
    2022 WUOF/SIU International Consultation on Urological Diseases: Genetics and Tumor Microenvironment of Renal Cell Carcinoma. Renal cell carcinoma is a diverse group of diseases that can be distinguished by distinct histopathologic and genomic features. In this comprehensive review, we highlight recent advancements in our understanding of the genetic and microenvironmental hallmarks of kidney cancer. We begin with clear cell renal cell carcinoma (ccRCC), the most common subtype of this disease. We review the chromosomal and genetic alterations that drive initiation and progression of ccRCC, which has recently been shown to follow multiple highly conserved evolutionary trajectories that in turn impact disease progression and prognosis. We also review the diverse genetic events that define the many recently recognized rare subtypes within non-clear cell RCC. Finally, we discuss our evolving understanding of the ccRCC microenvironment, which has been revolutionized by recent bulk and single-cell transcriptomic analyses, suggesting potential biomarkers for guiding systemic therapy in the management of advanced cc...
    Single-bone versus both-bone plating of unstable paediatric both-bone forearm fractures. A randomized controlled clinical trial. PURPOSE: This clinical trial compares the functional and radiological outcomes of single-bone fixation to both-bone fixation of unstable paediatric both-bone forearm fractures. METHODS: This individually randomized two-group parallel clinical trial was performed following the Consolidated Standards of Reporting Trials or the both-bone fixation group (control). Primary outcomes were forearm range of motion and fracture union, while secondary outcomes were forearm function (price criteria), radius re-angulation, wrist and elbow range of motion, and surgical time RESULTS: A total of 50 children were included. Out of these 50 children, 25 were randomized to either arm of the study. All children in either group received the treatment assigned by randomization. Fifty (100%) children were available for final follow-up at six months post-operatively. The mean age of single-bone and both-bone fixation groups was 11.48 ± 1.93 and 13 ± 1.75 years, respectively, with a statistically significant di...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • learning_rate: 2e-05
  • max_steps: 617193
  • log_level: info
  • fp16: True
  • dataloader_num_workers: 16
  • load_best_model_at_end: True
  • resume_from_checkpoint: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: 617193
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: info
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 16
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: True
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss
0.0000 1 3.8953 -
0.0049 1000 0.7985 -
0.0097 2000 0.2636 -
0.0146 3000 0.2381 -
0.0194 4000 0.2166 -
0.0243 5000 0.1595 -
0.0292 6000 0.1696 -
0.0340 7000 0.1483 -
0.0389 8000 0.1377 -
0.0437 9000 0.1347 -
0.0486 10000 0.1524 -
0.0535 11000 0.1374 -
0.0583 12000 0.0999 -
0.0632 13000 0.1363 -
0.0680 14000 0.1424 -
0.0729 15000 0.0918 -
0.0778 16000 0.2012 -
0.0826 17000 0.0949 -
0.0875 18000 0.1418 -
0.0924 19000 0.0889 -
0.0972 20000 0.1648 -
0.1021 21000 0.097 -
0.1069 22000 0.1613 -
0.1118 23000 0.0852 -
0.1167 24000 0.102 -
0.1215 25000 0.0808 -
0.1264 26000 0.1621 -
0.1312 27000 0.0897 -
0.1361 28000 0.1687 -
0.1410 29000 0.0869 -
0.1458 30000 0.1747 -
0.1507 31000 0.0823 -
0.1555 32000 0.1312 -
0.1604 33000 0.1322 -
0.1653 34000 0.0924 -
0.1701 35000 0.1746 -
0.1750 36000 0.0852 -
0.1798 37000 0.0936 -
0.1847 38000 0.1498 -
0.1896 39000 0.094 -
0.1944 40000 0.1487 -
0.1993 41000 0.104 -
0.2041 42000 0.0879 -
0.2090 43000 0.1405 -
0.2139 44000 0.0893 -
0.2187 45000 0.099 -
0.2236 46000 0.1192 -
0.2285 47000 0.081 -
0.2333 48000 0.1004 -
0.2382 49000 0.1091 -
0.2430 50000 0.0911 -
0.2479 51000 0.0789 -
0.2528 52000 0.081 -
0.2576 53000 0.1084 -
0.2625 54000 0.097 -
0.2673 55000 0.0696 -
0.2722 56000 0.0804 -
0.2771 57000 0.0767 -
0.2819 58000 0.0794 -
0.2868 59000 0.0736 -
0.2916 60000 0.0909 -
0.2965 61000 0.0576 -
0.3014 62000 0.0642 -
0.3062 63000 0.0705 -
0.3111 64000 0.0835 -
0.3159 65000 0.0814 -
0.3208 66000 0.0664 -
0.3257 67000 0.0509 -
0.3305 68000 0.0669 -
0.3354 69000 0.0844 -
0.3402 70000 0.0952 -
0.3451 71000 0.1012 -
0.3500 72000 0.0431 -
0.3548 73000 0.0526 -
0.3597 74000 0.0643 -
0.3646 75000 0.0556 -
0.3694 76000 0.1135 -
0.3743 77000 0.0641 -
0.3791 78000 0.0784 -
0.3840 79000 0.0432 -
0.3889 80000 0.0693 -
0.3937 81000 0.0841 -
0.3986 82000 0.0518 -
0.4034 83000 0.0581 -
0.4083 84000 0.0749 -
0.4132 85000 0.0376 -
0.4180 86000 0.0485 -
0.4229 87000 0.0542 -
0.4277 88000 0.0821 -
0.4326 89000 0.068 -
0.4375 90000 0.054 -
0.4423 91000 0.042 -
0.4472 92000 0.0477 -
0.4520 93000 0.0556 -
0.4569 94000 0.0557 -
0.4618 95000 0.0954 -
0.4666 96000 0.037 -
0.4715 97000 0.0431 -
0.4763 98000 0.0394 -
0.4812 99000 0.0425 -
0.4861 100000 0.0347 -
0.4909 101000 0.0482 -
0.4958 102000 0.053 -
0.5007 103000 0.0289 -
0.5055 104000 0.0364 -
0.5104 105000 0.0332 -
0.5152 106000 0.0275 -
0.5201 107000 0.0626 -
0.5250 108000 0.0552 -
0.5298 109000 0.0365 -
0.5347 110000 0.045 -
0.5395 111000 0.0476 -
0.5444 112000 0.049 -
0.5493 113000 0.0334 -
0.5541 114000 0.0412 -
0.5590 115000 0.0547 -
0.5638 116000 0.0375 -
0.5687 117000 0.0776 -
0.5736 118000 0.0481 -
0.5784 119000 0.0403 -
0.5833 120000 0.0467 -
0.5881 121000 0.0347 -
0.5930 122000 0.0478 -
0.5979 123000 0.0455 -
0.6027 124000 0.0541 -
0.6076 125000 0.0483 -
0.6124 126000 0.0583 -
0.6173 127000 0.0499 -
0.6222 128000 0.0488 -
0.6270 129000 0.0403 -
0.6319 130000 0.0314 -
0.6368 131000 0.0253 -
0.6416 132000 0.0473 -
0.6465 133000 0.0433 -
0.6513 134000 0.039 -
0.6562 135000 0.0334 -
0.6611 136000 0.0427 -
0.6659 137000 0.0401 -
0.6708 138000 0.0465 -
0.6756 139000 0.0393 -
0.6805 140000 0.0481 -
0.6854 141000 0.0504 -
0.6902 142000 0.0381 -
0.6951 143000 0.0326 -
0.6999 144000 0.0314 -
0.7048 145000 0.0335 -
0.7097 146000 0.0273 -
0.7145 147000 0.034 -
0.7194 148000 0.043 -
0.7242 149000 0.0395 -
0.7291 150000 0.0335 -
0.7340 151000 0.0414 -
0.7388 152000 0.0423 -
0.7437 153000 0.0336 -
0.7485 154000 0.0424 -
0.7534 155000 0.0394 -
0.7583 156000 0.0408 -
0.7631 157000 0.0381 -
0.7680 158000 0.0348 -
0.7729 159000 0.0428 -
0.7777 160000 0.0426 -
0.7826 161000 0.0311 -
0.7874 162000 0.0276 -
0.7923 163000 0.0207 -
0.7972 164000 0.0264 -
0.8020 165000 0.0265 -
0.8069 166000 0.0214 -
0.8117 167000 0.0279 -
0.8166 168000 0.0265 -
0.8215 169000 0.0265 -
0.8263 170000 0.029 -
0.8312 171000 0.0281 -
0.8360 172000 0.0314 -
0.8409 173000 0.0316 -
0.8458 174000 0.0265 -
0.8506 175000 0.0305 -
0.8555 176000 0.0278 -
0.8603 177000 0.0314 -
0.8652 178000 0.0274 -
0.8701 179000 0.0232 -
0.8749 180000 0.0225 -
0.8798 181000 0.0262 -
0.8846 182000 0.0271 -
0.8895 183000 0.0253 -
0.8944 184000 0.0255 -
0.8992 185000 0.0259 -
0.9041 186000 0.0248 -
0.9089 187000 0.0223 -
0.9138 188000 0.0221 -
0.9187 189000 0.0236 -
0.9235 190000 0.0239 -
0.9284 191000 0.0359 -
0.9333 192000 0.0248 -
0.9381 193000 0.0195 -
0.9430 194000 0.0217 -
0.9478 195000 0.0209 -
0.9527 196000 0.0197 -
0.9576 197000 0.0199 -
0.9624 198000 0.021 -
0.9673 199000 0.0201 -
0.9721 200000 0.0203 -
0.9770 201000 0.018 -
0.9819 202000 0.02 -
0.9867 203000 0.0184 -
0.9916 204000 0.0178 -
0.9964 205000 0.0222 -
1.0000 205731 - 0.0007

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
11
Safetensors
Model size
42.5M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.