--- tags: - flair - text-classification language: - multilingual - en library_name: flair widget: - text: This is a gentle comment. license: mit pipeline_tag: text-classification --- # Offensive language detection ## Tasks The model combines three classifiers for all three tasks of the OLID dataset [1]. - subtask a: OFF, NOT - subtask b: TIN, UNT - subtask c: IND, GRP, OTH Trained with [Flair NLP](https://github.com/flairNLP/flair) as a multi-task model. Training data: [Offensive Language Identification Dataset](https://sites.google.com/site/offensevalsharedtask/olid) (OLID) V1.0 [1] Test data: test set from [Semi-Supervised Dataset for Offensive Language Identification](https://sites.google.com/site/offensevalsharedtask/solid) (SOLID) [2] ## Citation When using this model, please cite: > Gregor Wiedemann, Seid Muhie Yimam, and Chris Biemann. 2020. UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1638–1644, Barcelona (online). International Committee for Computational Linguistics. ## Evaluation scores Evaluation was conducted on the SemEval 2020 Task 12 English test set. Thus, results can be compared to [3] ### Task A ``` Results: - F-score (micro) 0.9256 - F-score (macro) 0.9131 - Accuracy 0.9256 By class: precision recall f1-score support NOT 0.9922 0.9042 0.9461 2807 OFF 0.7976 0.9815 0.8800 1080 accuracy 0.9256 3887 macro avg 0.8949 0.9428 0.9131 3887 weighted avg 0.9381 0.9256 0.9278 3887 ``` ### Task B ``` Results: - F-score (micro) 0.7138 - F-score (macro) 0.6408 - Accuracy 0.7138 By class: precision recall f1-score support TIN 0.6826 0.9741 0.8027 850 UNT 0.8947 0.3269 0.4789 572 accuracy 0.7138 1422 macro avg 0.7887 0.6505 0.6408 1422 weighted avg 0.7679 0.7138 0.6724 1422 ``` ### Task C ``` Results: - F-score (micro) 0.8318 - F-score (macro) 0.6978 - Accuracy 0.8318 By class: precision recall f1-score support IND 0.8703 0.9483 0.9076 580 GRP 0.7216 0.6684 0.6940 190 OTH 0.7143 0.3750 0.4918 80 accuracy 0.8318 850 macro avg 0.7687 0.6639 0.6978 850 weighted avg 0.8223 0.8318 0.8207 850 ``` ---- # References [1] Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. Predicting the Type and Target of Offensive Posts in Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1415–1420, Minneapolis, Minnesota. Association for Computational Linguistics. [2] Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, and Preslav Nakov. 2021. SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 915–928, Online. Association for Computational Linguistics. [3] Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, and Çağrı Çöltekin. 2020. SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020). In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1425–1447, Barcelona (online). International Committee for Computational Linguistics.