File size: 5,997 Bytes
469c128 a2c73cf c9276bf a2c73cf c9276bf e5a4d1e c9276bf e5a4d1e c9276bf e5a4d1e c9276bf a2c73cf 719afe3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
---
license: apache-2.0
datasets:
- seara/ru_go_emotions
language:
- ru
base_model:
- cointegrated/rubert-tiny2
pipeline_tag: text-classification
---
# RuBert-tiny2-EmotionsDetected
This model was obtained by fine-tuning the model [RuBert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) on the dataset [ru-goemotions](https://github.com/searayeah/ru-goemotions) containing 28 emotions:
```
0: admiration (восхищение)
1: amusement (веселье)
2: anger (злость)
3: annoyance (раздражение)
4: approval (одобрение)
5: caring (забота)
6: confusion (непонимание)
7: curiosity (любопытство)
8: desire (желание)
9: disappointment (разочарование)
10: disapproval (неодобрение)
11: disgust (отвращение)
12: embarrassment (смущение)
13: excitement (возбуждение)
14: fear (страх)
15: gratitude (признательность)
16: grief (горе)
17: joy (радость)
18: love (любовь)
19: nervousness (нервозность)
20: optimism (оптимизм)
21: pride (гордость)
22: realization (осознание)
23: relief (облегчение)
24: remorse (раскаяние)
25: sadness (грусть)
26: surprise (удивление)
27: neutral (нейтральность)
```
At the moment, the model has been trained for 40 epochs with the following hyperparameters:
```
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=1,
weight_decay=0.01,
learning_rate=1e-5,
save_total_limit=2,
load_best_model_at_end=True,
metric_for_best_model="f1",
greater_is_better=True
```
## Test
With the help of a well-known AI, a data set was compiled to verify the veracity of the model's predictions. Here's what happened:
**!The data for the test did not undergo a thorough manual inspection!**
<p align="center">
<img width="80%" src="img/TrueEmotion.png">
</p>
<p align="center">
<img width="80%" src="img/PredictedEmotion.png">
</p>
<p align="center">
<img width="80%" src="img/Comparison.png">
</p>
It can be seen from the above that the model is able to distinguish between basic emotions, but has problems with specific ones (grief (горе), pride (гордость)), and also defines a neutral emotion more often than expected (most likely this happened due to inaccuracies in the dataset, since in ordinary speech you can "use" several emojis at once and most often people use the neutral one)
The model will be updated as improvements are made.
# Usage example
```
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('AniMAntZeZo/RuBert-tiny2-EmotionsDetected')
model = AutoModelForSequenceClassification.from_pretrained('AniMAntZeZo/RuBert-tiny2-EmotionsDetected')
model.to("cuda" if torch.cuda.is_available() else "cpu",
emotion_columns = [
"admiration", "amusement", "anger", "annoyance", "approval", "caring", "confusion", "curiosity", "desire",
"disappointment", "disapproval", "disgust", "embarrassment", "excitement", "fear", "gratitude", "grief", "joy",
"love", "nervousness", "optimism", "pride", "realization", "relief", "remorse", "sadness", "surprise", "neutral"
]
def predict_emotions(
text,
model,
tokenizer,
emotion_columns,
device="cuda" if torch.cuda.is_available() else "cpu",
threshold=0.1
):
emotion_translations = {
"admiration": "восхищение",
"amusement": "веселье",
"anger": "злость",
"annoyance": "раздражение",
"approval": "одобрение",
"caring": "забота",
"confusion": "непонимание",
"curiosity": "любопытство",
"desire": "желание",
"disappointment": "разочарование",
"disapproval": "неодобрение",
"disgust": "отвращение",
"embarrassment": "смущение",
"excitement": "возбуждение",
"fear": "страх",
"gratitude": "признательность",
"grief": "горе",
"joy": "радость",
"love": "любовь",
"nervousness": "нервозность",
"optimism": "оптимизм",
"pride": "гордость",
"realization": "осознание",
"relief": "облегчение",
"remorse": "раскаяние",
"sadness": "грусть",
"surprise": "удивление",
"neutral": "нейтральность",
}
model.to(device)
model.eval()
inputs = tokenizer(text, return_tensors="pt", padding="max_length", truncation=True, max_length=128).to(device)
with torch.no_grad():
logits = model(**inputs).logits
probabilities = torch.sigmoid(logits).squeeze().cpu().numpy()
predictions = {
f"{emotion} ({emotion_translations[emotion]})": prob
for emotion, prob in zip(emotion_columns, probabilities) if prob > threshold
}
sorted_predictions = dict(sorted(predictions.items(), key=lambda item: item[1], reverse=True))
return sorted_predictions
```
[INPUT]
```
example_text = "Как же я рад!"
predictions = predict_emotions(example_text, model, tokenizer, emotion_columns)
print("Emotions:", predictions)
```
[OUTPUT]
```
Emotions: {'joy (радость)': 0.6736836, 'excitement (возбуждение)': 0.25723574}
```
[INPUT]
```
example_text = "Я обиделся!"
predictions = predict_emotions(example_text, model, tokenizer, emotion_columns)
print("Emotions:", predictions)
```
[OUTPUT]
```
Emotions: {'sadness (грусть)': 0.3111033, 'disappointment (разочарование)': 0.2943853, 'annoyance (раздражение)': 0.19748639, 'anger (злость)': 0.16338393}
``` |