|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- andrea-t94/TwitterSentiment140 |
|
language: |
|
- en |
|
metrics: |
|
- perplexity |
|
library_name: transformers |
|
tags: |
|
- distillroberta-base |
|
- twitter |
|
pipeline_tag: fill-mask |
|
--- |
|
|
|
## Twitter-roBERTa-base fine-tuned using masked language modelling |
|
This is a RoBERTa-base model finetuned (domain adaptation) on ~2M tweets from Jin 2009 (sentiment140). |
|
This is the first step of a two steps approach to finetune for sentiment analysis (ULMFit) |
|
This model is suitable for English. |
|
|
|
Main charachetistics: |
|
- pretrained model and tokenizer: distillroberta-base |
|
- no cleaning/processing applied to the data |
|
|
|
Reference Paper: [ULMFit](https://arxiv.org/abs/1801.06146). |
|
Reference dataset: [Sentiment140](https://www.kaggle.com/datasets/kazanova/sentiment140?resource=download) |
|
Git Repo: TBD |
|
Labels: 0 -> Negative; 1 -> Positive |