File size: 5,997 Bytes
469c128
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a2c73cf
 
 
c9276bf
a2c73cf
 
c9276bf
e5a4d1e
c9276bf
 
e5a4d1e
c9276bf
 
e5a4d1e
c9276bf
 
a2c73cf
 
 
719afe3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
license: apache-2.0
datasets:
- seara/ru_go_emotions
language:
- ru
base_model:
- cointegrated/rubert-tiny2
pipeline_tag: text-classification
---

# RuBert-tiny2-EmotionsDetected

This model was obtained by fine-tuning the model [RuBert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) on the dataset [ru-goemotions](https://github.com/searayeah/ru-goemotions) containing 28 emotions:
```
0: admiration (восхищение)  
1: amusement (веселье)  
2: anger (злость)  
3: annoyance (раздражение)  
4: approval (одобрение)  
5: caring (забота)  
6: confusion (непонимание)  
7: curiosity (любопытство)  
8: desire (желание)  
9: disappointment (разочарование)  
10: disapproval (неодобрение)  
11: disgust (отвращение)  
12: embarrassment (смущение)  
13: excitement (возбуждение)  
14: fear (страх)  
15: gratitude (признательность)  
16: grief (горе)  
17: joy (радость)  
18: love (любовь)  
19: nervousness (нервозность)  
20: optimism (оптимизм)  
21: pride (гордость)  
22: realization (осознание)  
23: relief (облегчение)  
24: remorse (раскаяние)  
25: sadness (грусть)  
26: surprise (удивление)  
27: neutral (нейтральность)  
```

At the moment, the model has been trained for 40 epochs with the following hyperparameters:
```
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=1,
    weight_decay=0.01,
    learning_rate=1e-5,
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    greater_is_better=True
```

## Test
With the help of a well-known AI, a data set was compiled to verify the veracity of the model's predictions. Here's what happened:  
**!The data for the test did not undergo a thorough manual inspection!**

<p align="center">
  <img width="80%" src="img/TrueEmotion.png">
</p>
<p align="center">
  <img width="80%" src="img/PredictedEmotion.png">
</p>
<p align="center">
  <img width="80%" src="img/Comparison.png">
</p>


It can be seen from the above that the model is able to distinguish between basic emotions, but has problems with specific ones (grief (горе), pride (гордость)), and also defines a neutral emotion more often than expected (most likely this happened due to inaccuracies in the dataset, since in ordinary speech you can "use" several emojis at once and most often people use the neutral one)

The model will be updated as improvements are made.

# Usage example
```
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('AniMAntZeZo/RuBert-tiny2-EmotionsDetected')
model = AutoModelForSequenceClassification.from_pretrained('AniMAntZeZo/RuBert-tiny2-EmotionsDetected')
model.to("cuda" if torch.cuda.is_available() else "cpu",


emotion_columns = [
    "admiration", "amusement", "anger", "annoyance", "approval", "caring", "confusion", "curiosity", "desire",
    "disappointment", "disapproval", "disgust", "embarrassment", "excitement", "fear", "gratitude", "grief", "joy",
    "love", "nervousness", "optimism", "pride", "realization", "relief", "remorse", "sadness", "surprise", "neutral"
]

def predict_emotions(
    text, 
    model, 
    tokenizer, 
    emotion_columns, 
    device="cuda" if torch.cuda.is_available() else "cpu", 
    threshold=0.1
):

    emotion_translations = {
        "admiration": "восхищение",
        "amusement": "веселье",
        "anger": "злость",
        "annoyance": "раздражение",
        "approval": "одобрение",
        "caring": "забота",
        "confusion": "непонимание",
        "curiosity": "любопытство",
        "desire": "желание",
        "disappointment": "разочарование",
        "disapproval": "неодобрение",
        "disgust": "отвращение",
        "embarrassment": "смущение",
        "excitement": "возбуждение",
        "fear": "страх",
        "gratitude": "признательность",
        "grief": "горе",
        "joy": "радость",
        "love": "любовь",
        "nervousness": "нервозность",
        "optimism": "оптимизм",
        "pride": "гордость",
        "realization": "осознание",
        "relief": "облегчение",
        "remorse": "раскаяние",
        "sadness": "грусть",
        "surprise": "удивление",
        "neutral": "нейтральность",
    }

    model.to(device)
    model.eval()
    inputs = tokenizer(text, return_tensors="pt", padding="max_length", truncation=True, max_length=128).to(device)
    with torch.no_grad():
        logits = model(**inputs).logits
    probabilities = torch.sigmoid(logits).squeeze().cpu().numpy()

    predictions = {
        f"{emotion} ({emotion_translations[emotion]})": prob
        for emotion, prob in zip(emotion_columns, probabilities) if prob > threshold
    }

    sorted_predictions = dict(sorted(predictions.items(), key=lambda item: item[1], reverse=True))
    
    return sorted_predictions
```
[INPUT]
```
example_text = "Как же я рад!"
predictions = predict_emotions(example_text, model, tokenizer, emotion_columns)
print("Emotions:", predictions)
```
[OUTPUT]
```
Emotions: {'joy (радость)': 0.6736836, 'excitement (возбуждение)': 0.25723574}
```
[INPUT]
```
example_text = "Я обиделся!"
predictions = predict_emotions(example_text, model, tokenizer, emotion_columns)
print("Emotions:", predictions)
```
[OUTPUT]
```
Emotions: {'sadness (грусть)': 0.3111033, 'disappointment (разочарование)': 0.2943853, 'annoyance (раздражение)': 0.19748639, 'anger (злость)': 0.16338393}
```