|
--- |
|
tags: |
|
- flair |
|
- text-classification |
|
language: |
|
- multilingual |
|
- en |
|
library_name: flair |
|
widget: |
|
- text: This is a gentle comment. |
|
license: mit |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Offensive language detection |
|
|
|
## Tasks |
|
|
|
The model combines three classifiers for all three tasks of the OLID dataset [1]. |
|
|
|
- subtask a: OFF, NOT |
|
- subtask b: TIN, UNT |
|
- subtask c: IND, GRP, OTH |
|
|
|
Trained with [Flair NLP](https://github.com/flairNLP/flair) as a multi-task model. |
|
|
|
Training data: [Offensive Language Identification Dataset](https://sites.google.com/site/offensevalsharedtask/olid) (OLID) V1.0 [1] |
|
Test data: test set from [Semi-Supervised Dataset for Offensive Language Identification](https://sites.google.com/site/offensevalsharedtask/solid) (SOLID) [2] |
|
|
|
## Citation |
|
|
|
When using this model, please cite: |
|
|
|
> Gregor Wiedemann, Seid Muhie Yimam, and Chris Biemann. 2020. UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1638–1644, Barcelona (online). International Committee for Computational Linguistics. |
|
|
|
|
|
## Evaluation scores |
|
|
|
Evaluation was conducted on the SemEval 2020 Task 12 English test set. Thus, results can be compared to [3] |
|
|
|
### Task A |
|
|
|
``` |
|
Results: |
|
- F-score (micro) 0.9256 |
|
- F-score (macro) 0.9131 |
|
- Accuracy 0.9256 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
NOT 0.9922 0.9042 0.9461 2807 |
|
OFF 0.7976 0.9815 0.8800 1080 |
|
|
|
accuracy 0.9256 3887 |
|
macro avg 0.8949 0.9428 0.9131 3887 |
|
weighted avg 0.9381 0.9256 0.9278 3887 |
|
``` |
|
|
|
### Task B |
|
``` |
|
Results: |
|
- F-score (micro) 0.7138 |
|
- F-score (macro) 0.6408 |
|
- Accuracy 0.7138 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
TIN 0.6826 0.9741 0.8027 850 |
|
UNT 0.8947 0.3269 0.4789 572 |
|
|
|
accuracy 0.7138 1422 |
|
macro avg 0.7887 0.6505 0.6408 1422 |
|
weighted avg 0.7679 0.7138 0.6724 1422 |
|
``` |
|
|
|
### Task C |
|
``` |
|
Results: |
|
- F-score (micro) 0.8318 |
|
- F-score (macro) 0.6978 |
|
- Accuracy 0.8318 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
IND 0.8703 0.9483 0.9076 580 |
|
GRP 0.7216 0.6684 0.6940 190 |
|
OTH 0.7143 0.3750 0.4918 80 |
|
|
|
accuracy 0.8318 850 |
|
macro avg 0.7687 0.6639 0.6978 850 |
|
weighted avg 0.8223 0.8318 0.8207 850 |
|
``` |
|
|
|
---- |
|
# References |
|
|
|
[1] Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. Predicting the Type and Target of Offensive Posts in Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1415–1420, Minneapolis, Minnesota. Association for Computational Linguistics. |
|
|
|
[2] Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, and Preslav Nakov. 2021. SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 915–928, Online. Association for Computational Linguistics. |
|
|
|
[3] Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, and Çağrı Çöltekin. 2020. SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020). In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1425–1447, Barcelona (online). International Committee for Computational Linguistics. |