File size: 836 Bytes
854d8f1 7790588 8aa487b a2ea80f 8aa487b a2ea80f 8aa487b a2ea80f 8aa487b a2ea80f 8aa487b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
---
license: apache-2.0
datasets:
- andrea-t94/TwitterSentiment140
language:
- en
metrics:
- perplexity
library_name: transformers
tags:
- distillroberta-base
- twitter
pipeline_tag: fill-mask
---
## Twitter-roBERTa-base fine-tuned using masked language modelling
This is a RoBERTa-base model finetuned (domain adaptation) on ~2M tweets from Jin 2009 (sentiment140).
This is the first step of a two steps approach to finetune for sentiment analysis (ULMFit)
This model is suitable for English.
Main charachetistics:
- pretrained model and tokenizer: distillroberta-base
- no cleaning/processing applied to the data
Reference Paper: [ULMFit](https://arxiv.org/abs/1801.06146).
Reference dataset: [Sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140?resource=download)
Git Repo: TBD
Labels: 0 -> Negative; 1 -> Positive |