akin-em7

This is a sentence-transformers model finetuned from zerbaUst/cs-em6 on the json dataset. It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: zerbaUst/cs-em6
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 896 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zerbaUst/akin-em7")
# Run inference
sentences = [
    'ive lost my account due to my mum making my account and shes lost the details\n\nlost email address: [EMAIL]',
    "issue no1:the player is looking to recover their account. - if there is no relevant target account information, please redirect the player to the recovery form available on the help portal.- if you received a recovery form case and have found the player's account information, please follow the dedicated processes you can find in signavio (step 1a below)issue no2:the player is looking to recover their account, but, there is suspicion of account reselling/sharing.there are a few factors you can look out for to determine whether there is a suspicion of account reselling or sharing >- the log-in country in the p360 activity log, can be split into different continuous sections between the account owner and other accesses- the owner will be reaching out to us to recover the account at all times over many cases- the owner disables 2fa, and changes the email address and the display name on the account, just before the activity country changes- there are previous account recovery cases on the account already- the owner is usually able to provide all the information required for aov- you may see duplicate game activations from purchases made by the buyer/other people who access the account from different ip'splease also note:- if a player admits they are not the original owner, we should refuse the recovery request. this includes mentioning the email address on their account belongs to someone else.- if a player mentions that they share the account with other people, we should also refuse recovery in this situation, as this account sharing breaks our terms of service.- if a player admits that they sold their account, we also deny recovery.- if a player also directly admits to purchasing an account, you may deny further support as they have admitted not being the original owner of the account (please leave a private note on the account mentioning player admitting to purchasing account the account)- if they admit to selling this account, we deny recovery also as they broke our terms of service (please leave a private note on the account mentioning the player admitting to selling the account)----------------------------------------------------------case handling:1) determine which of the 2x issues noted above, your contact falls into issue 1 or issue 2.a) for issue 1, please follow this process > https://editor.signavio.com/p/hub/model/ad240bbab2d5496c8e9e9e285b253258b) for issue 2, if you suspect the account is being shared or resold, please make sure to check the salesforce notes. if you can see an existing note mentioning account reselling or sharing on the account (placed after 08/10/2020), deny account recovery and do not proceed with further steps.if you suspect the account being shared or resold and there are no salesforce notes please perform aov (they need to pass aov) and escalate to tier 2.if you are a tier 2 specialist please follow process:\xa0https://editor.signavio.com/p/hub/model/094692df25534e7389cf86aaa89f73f8note: only escalate cases where we suspect the original account owner has sold the account. no need to escalate cases where the account selling was initiated by a hacker.2) next check if the player can pass aov and recover the account by following the process: https://goto.ubisoft.org/jtguf3) if the player passes the aov process, and we suspect that the account is being resold or shared, please make sure the player is aware of all account security measures available to them, and let them know that we may not be able to recover the account again in future. please also make sure to provide the following account security faq: ubisoft.com/help/article/0000627644) place a salesforce note on the account, that we suspect is being resold or shared.please note the usual <[account] recovery> subject line still applies to these cases.---------------------------------------------------------additional information:as we are unable to prove beyond doubt that an account is being deliberately resold or shared, please make sure you do not accuse the player directly of account reselling or sharing. rather than focusing on the reselling/sharing assumption, we would like the focus to be on account security and the fact that our previous security instructions were not followed.",
    'issue: player is reporting a cheater/hacker in gamecase handling: please thank the user for reporting the player. and advise we cannot communicate the outcome of any such investigation.please make sure that the reported username field in the ticket is filled out with the username of the suspected cheater.please advise the player to report the cheater within the game.additional information:if the player mentions being ddos during his game, please use the following kb : [report a player] cheating / ddos - 000084534 - i would like to report a player ddosing the game',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 896]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_896 dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.0158 0.0173 0.0152 0.0149 0.0122 0.0099
cosine_accuracy@3 0.0346 0.0349 0.0331 0.0328 0.029 0.0224
cosine_accuracy@5 0.0463 0.046 0.0448 0.0439 0.0388 0.0319
cosine_accuracy@10 0.066 0.0657 0.0624 0.0615 0.0582 0.0472
cosine_precision@1 0.0158 0.0173 0.0152 0.0149 0.0122 0.0099
cosine_precision@3 0.0115 0.0116 0.011 0.0109 0.0097 0.0075
cosine_precision@5 0.0093 0.0092 0.009 0.0088 0.0078 0.0064
cosine_precision@10 0.0066 0.0066 0.0062 0.0061 0.0058 0.0047
cosine_recall@1 0.0158 0.0173 0.0152 0.0149 0.0122 0.0099
cosine_recall@3 0.0346 0.0349 0.0331 0.0328 0.029 0.0224
cosine_recall@5 0.0463 0.046 0.0448 0.0439 0.0388 0.0319
cosine_recall@10 0.066 0.0657 0.0624 0.0615 0.0582 0.0472
cosine_ndcg@10 0.0376 0.0382 0.0361 0.0354 0.0321 0.0258
cosine_mrr@10 0.029 0.0299 0.0281 0.0274 0.0242 0.0192
cosine_map@100 0.0326 0.0334 0.0318 0.0309 0.0276 0.0226

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 13,396 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 4 tokens
    • mean: 64.85 tokens
    • max: 512 tokens
    • min: 0 tokens
    • mean: 398.47 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    the email i use for this account got hacked and i can no longer access the email. so i can not login into ubisoft and i no longer feel comfortable with that email on my account.

    lost email address: [EMAIL]
    issue no1:the player is looking to recover their account. - if there is no relevant target account information, please redirect the player to the recovery form available on the help portal.- if you received a recovery form case and have found the player's account information, please follow the dedicated processes you can find in signavio (step 1a below)issue no2:the player is looking to recover their account, but, there is suspicion of account reselling/sharing.there are a few factors you can look out for to determine whether there is a suspicion of account reselling or sharing >- the log-in country in the p360 activity log, can be split into different continuous sections between the account owner and other accesses- the owner will be reaching out to us to recover the account at all times over many cases- the owner disables 2fa, and changes the email address and the display name on the account, just before the activity country changes- there are previous account recovery cases on the a...
    i am unable to change my email on my account as the old email i no longer have access to it. the old email is showing on my account as [EMAIL] but i need it to be changed to [EMAIL]. issue no1:the player is looking to recover their account. - if there is no relevant target account information, please redirect the player to the recovery form available on the help portal.- if you received a recovery form case and have found the player's account information, please follow the dedicated processes you can find in signavio (step 1a below)issue no2:the player is looking to recover their account, but, there is suspicion of account reselling/sharing.there are a few factors you can look out for to determine whether there is a suspicion of account reselling or sharing >- the log-in country in the p360 activity log, can be split into different continuous sections between the account owner and other accesses- the owner will be reaching out to us to recover the account at all times over many cases- the owner disables 2fa, and changes the email address and the display name on the account, just before the activity country changes- there are previous account recovery cases on the a...
    It seems like there is no text provided for me to process. Please provide the support ticket text that you would like me to mask. issuebased on the customer's description, we are unclear on what the issue might be and we need to ask them for more information, to help us find the correct kbcase handling1) check what information the player has provided us with in their first message, and check keywords in sowa to see if you can find a relevant kbif the player and issue is unclear, please ask for further information, such as >is this issue related to their ubisoft account / a purchase or subscription / missing content / a ban or player report / a game bug / a technical issue / feedback on a game or our servicescan the player further explain what the problem is?do they have any screenshots or videos they can provide us with? are they seeing any error messages on their side they can share with us?2) note: this is a placeholder subject line and the sl should always be updated on the case, once we receive enough info to confirm if we have a relevant kb we can use.if the issue turns out to be the player is reporting an u...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            896,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • gradient_accumulation_steps: 32
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • fp16: True
  • tf32: False
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 32
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_896_cosine_ndcg@10 dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0478 10 0.7342 - - - - - -
0.0956 20 0.359 - - - - - -
0.1433 30 0.6206 - - - - - -
0.1911 40 0.3286 - - - - - -
0.2389 50 0.4635 - - - - - -
0.2867 60 0.4779 - - - - - -
0.3344 70 0.6539 - - - - - -
0.3822 80 0.5646 - - - - - -
0.4300 90 0.5571 - - - - - -
0.4778 100 0.4717 - - - - - -
0.5255 110 0.3666 - - - - - -
0.5733 120 0.692 - - - - - -
0.6211 130 0.6166 - - - - - -
0.6689 140 0.618 - - - - - -
0.7166 150 0.4731 - - - - - -
0.7644 160 0.5375 - - - - - -
0.8122 170 0.4384 - - - - - -
0.8600 180 0.4214 - - - - - -
0.9077 190 0.7847 - - - - - -
0.9555 200 0.7723 - - - - - -
0.9985 209 - 0.0381 0.0366 0.0382 0.0369 0.0343 0.0271
1.0033 210 0.5171 - - - - - -
1.0511 220 0.5229 - - - - - -
1.0988 230 0.3208 - - - - - -
1.1466 240 0.361 - - - - - -
1.1944 250 0.1921 - - - - - -
1.2422 260 0.2428 - - - - - -
1.2899 270 0.214 - - - - - -
1.3377 280 0.5747 - - - - - -
1.3855 290 0.4278 - - - - - -
1.4333 300 0.2921 - - - - - -
1.4810 310 0.3406 - - - - - -
1.5288 320 0.3055 - - - - - -
1.5766 330 0.4052 - - - - - -
1.6244 340 0.3753 - - - - - -
1.6721 350 0.2922 - - - - - -
1.7199 360 0.324 - - - - - -
1.7677 370 0.2779 - - - - - -
1.8155 380 0.3366 - - - - - -
1.8632 390 0.4493 - - - - - -
1.9110 400 0.3796 - - - - - -
1.9588 410 0.4291 - - - - - -
1.9970 418 - 0.0378 0.0387 0.0361 0.0346 0.0309 0.0257
2.0066 420 0.3842 - - - - - -
2.0543 430 0.4343 - - - - - -
2.1021 440 0.3238 - - - - - -
2.1499 450 0.2563 - - - - - -
2.1977 460 0.3092 - - - - - -
2.2454 470 0.2376 - - - - - -
2.2932 480 0.2644 - - - - - -
2.3410 490 0.5582 - - - - - -
2.3888 500 0.3216 - - - - - -
2.4365 510 0.2821 - - - - - -
2.4843 520 0.2969 - - - - - -
2.5321 530 0.2768 - - - - - -
2.5799 540 0.3804 - - - - - -
2.6277 550 0.3968 - - - - - -
2.6754 560 0.2676 - - - - - -
2.7232 570 0.3127 - - - - - -
2.7710 580 0.2596 - - - - - -
2.8188 590 0.3421 - - - - - -
2.8665 600 0.493 - - - - - -
2.9143 610 0.3426 - - - - - -
2.9621 620 0.4613 - - - - - -
2.9955 627 - 0.0363 0.0368 0.0358 0.0348 0.0319 0.0253
3.0099 630 0.3526 - - - - - -
3.0576 640 0.4347 - - - - - -
3.1054 650 0.3257 - - - - - -
3.1532 660 0.2329 - - - - - -
3.2010 670 0.3199 - - - - - -
3.2487 680 0.2374 - - - - - -
3.2965 690 0.2711 - - - - - -
3.3443 700 0.5732 - - - - - -
3.3921 710 0.293 - - - - - -
3.4398 720 0.2809 - - - - - -
3.4876 730 0.3323 - - - - - -
3.5354 740 0.2609 - - - - - -
3.5832 750 0.3763 - - - - - -
3.6309 760 0.3886 - - - - - -
3.6787 770 0.2631 - - - - - -
3.7265 780 0.3211 - - - - - -
3.7743 790 0.2488 - - - - - -
3.8220 800 0.3503 - - - - - -
3.8698 810 0.4986 - - - - - -
3.9176 820 0.3986 - - - - - -
3.9654 830 0.4216 - - - - - -
3.9940 836 - 0.0376 0.0382 0.0361 0.0354 0.0321 0.0258

Framework Versions

  • Python: 3.10.11
  • Sentence Transformers: 3.3.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.33.0
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for zerbaUst/akin-em7

Base model

zerbaUst/cs-em6
Finetuned
(1)
this model

Evaluation results