antypasd commited on
Commit
508b2b9
·
1 Parent(s): cac624f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -28
README.md CHANGED
@@ -1,46 +1,68 @@
1
- ---
2
- tags:
3
- - generated_from_keras_callback
4
- model-index:
5
- - name: tweet-topic-latest-multi
6
- results: []
7
- ---
8
 
9
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
10
- probably proofread and complete it, then remove this comment. -->
11
 
12
- # tweet-topic-latest-multi
 
13
 
14
- This model is a fine-tuned version of [antypasd/tweet-topic-latest-multi](https://huggingface.co/antypasd/tweet-topic-latest-multi) on an unknown dataset.
15
- It achieves the following results on the evaluation set:
16
 
 
17
 
18
- ## Model description
19
 
20
- More information needed
 
 
 
 
 
21
 
22
- ## Intended uses & limitations
23
 
24
- More information needed
25
 
26
- ## Training and evaluation data
 
 
 
 
27
 
28
- More information needed
 
 
29
 
30
- ## Training procedure
 
 
31
 
32
- ### Training hyperparameters
 
 
33
 
34
- The following hyperparameters were used during training:
35
- - optimizer: None
36
- - training_precision: float32
37
 
38
- ### Training results
39
 
 
 
 
 
 
 
 
 
 
40
 
 
 
 
 
41
 
42
- ### Framework versions
 
43
 
44
- - Transformers 4.23.1
45
- - TensorFlow 2.10.0
46
- - Tokenizers 0.13.1
 
 
 
1
+ # tweet-topic-latest-multi
 
 
 
 
 
 
2
 
 
 
3
 
4
+ This is a RoBERTa-base model trained on 168.86M tweets until the end of September 2022 and finetuned for multi-label topic classification on a corpus of 11,267 [tweets](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi).
5
+ The original RoBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-sep2022). This model is suitable for English.
6
 
7
+ - Reference Papers: [TimeLMs paper](https://arxiv.org/abs/2202.03829), [TweetTopic](https://arxiv.org/abs/2209.09824)
8
+ - Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
9
 
10
+ <b>Labels</b>:
11
 
 
12
 
13
+ | <span style="font-weight:normal">0: arts_&_culture</span> | <span style="font-weight:normal">5: fashion_&_style</span> | <span style="font-weight:normal">10: learning_&_educational</span> | <span style="font-weight:normal">15: science_&_technology</span> |
14
+ |-----------------------------|---------------------|----------------------------|--------------------------|
15
+ | 1: business_&_entrepreneurs | 6: film_tv_&_video | 11: music | 16: sports |
16
+ | 2: celebrity_&_pop_culture | 7: fitness_&_health | 12: news_&_social_concern | 17: travel_&_adventure |
17
+ | 3: diaries_&_daily_life | 8: food_&_dining | 13: other_hobbies | 18: youth_&_student_life |
18
+ | 4: family | 9: gaming | 14: relationships | |
19
 
 
20
 
21
+ ## Full classification example
22
 
23
+ ```python
24
+ from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
25
+ from transformers import AutoTokenizer
26
+ import numpy as np
27
+ from scipy.special import expit
28
 
29
+
30
+ MODEL = f"cardiffnlp/tweet-topic-latest-multi"
31
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
32
 
33
+ # PT
34
+ model = AutoModelForSequenceClassification.from_pretrained(MODEL)
35
+ class_mapping = model.config.id2label
36
 
37
+ text = "It is great to see athletes promoting awareness for climate change."
38
+ tokens = tokenizer(text, return_tensors='pt')
39
+ output = model(**tokens)
40
 
41
+ scores = output[0][0].detach().numpy()
42
+ scores = expit(scores)
43
+ predictions = (scores >= 0.5) * 1
44
 
 
45
 
46
+ # TF
47
+ #tf_model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
48
+ #class_mapping = tf_model.config.id2label
49
+ #text = "It is great to see athletes promoting awareness for climate change."
50
+ #tokens = tokenizer(text, return_tensors='tf')
51
+ #output = tf_model(**tokens)
52
+ #scores = output[0][0]
53
+ #scores = expit(scores)
54
+ #predictions = (scores >= 0.5) * 1
55
 
56
+ # Map to classes
57
+ for i in range(len(predictions)):
58
+ if predictions[i]:
59
+ print(class_mapping[i])
60
 
61
+ ```
62
+ Output:
63
 
64
+ ```
65
+ fitness_&_health
66
+ news_&_social_concern
67
+ sports
68
+ ```