File size: 3,393 Bytes
6fb5d57
7cf12d5
6fb5d57
 
 
 
c79a82a
6fb5d57
 
7cf12d5
70f5f26
936ae04
7cf12d5
 
70f5f26
 
 
 
 
7cf12d5
70f5f26
 
 
 
 
 
7cf12d5
70f5f26
 
 
 
 
 
 
 
 
936ae04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7cf12d5
 
 
 
 
 
 
 
 
936ae04
7cf12d5
 
 
 
 
 
 
70f5f26
7cf12d5
70f5f26
7cf12d5
70f5f26
7cf12d5
 
 
 
70f5f26
 
936ae04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70f5f26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
title: Submission Oriaz
emoji: 🔥
colorFrom: yellow
colorTo: green
sdk: docker
pinned: True
---

# Benchmarkusing different techniques

## Global Informations :

#### Intended Use

- **Primary intended uses**: Baseline comparison for climate disinformation classification models
- **Primary intended users**: Researchers and developers participating in the Frugal AI Challenge
- **Out-of-scope use cases**: Not intended for production use or real-world classification tasks

### Training Data

The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
- Size: ~6000 examples
- Split: 80% train, 20% test
- 8 categories of climate disinformation claims

#### Labels
0. No relevant claim detected
1. Global warming is not happening
2. Not caused by humans
3. Not bad or beneficial
4. Solutions harmful/unnecessary
5. Science is unreliable
6. Proponents are biased
7. Fossil fuels are needed

### Environmental Impact

Environmental impact is tracked using CodeCarbon, measuring:
- Carbon emissions during inference
- Energy consumption during inference

This tracking helps establish a baseline for the environmental impact of model deployment and inference.

### Ethical Considerations

- Dataset contains sensitive topics related to climate disinformation
- Environmental impact is tracked to promote awareness of AI's carbon footprint


## ML model for Climate Disinformation Classification

### Model Description

Find the best ML model to process vectorized quotes to detect climate change disinformation.

### Performance

#### Metrics (I used NVIDIA T4 small GPU)
- **Accuracy**: ~69-72%
- **Environmental Impact**: 
  - Emissions tracked in gCO2eq (~0,7g)
  - Energy consumption tracked in Wh (~1,8wh)

#### Model Architecture

ML models prefers numeric values so we need to embed our quotes. I used *MTEB Leaderboard* on HuggingFace to find the model with the best trade-off between performance and the number of parameters.

I then chosed "dunzhang/stella_en_400M_v5" model as embedder. It has the 7th best performance score with only 400M parameters.

Once the quote are embedded, I have 6091 values x 1024 features. After that, train-test split (70%, 30%).

Using TPOT Classifier, I found that the best model on my data was a Logistic Regressor.

Then here is the Confusion Matrix :

![image/png](https://cdn-uploads.huggingface.co/production/uploads/66169e1ce557753f30eab31b/tfAcfFu3Cnc9XJ00ixrWB.png)

### Limitations
- Embedding phase take ~30 secondes for 1800 quotes. It can be optimised and can have a real influence on carbon emissions.
- Hard to go over 70% accuracy with "simple" ML.
- Textual data have some interpretations limitations that little models can't find.



## Bert model for Climate Disinformation Classification

### Model Description

Fine tune model for model classification.

### Performance

#### Metrics (I used NVIDIA T4 small GPU)
- **Accuracy**: ~90%
- **Environmental Impact**: 
  - Emissions tracked in gCO2eq (~0,25g)
  - Energy consumption tracked in Wh (~0.7wh)

#### Model Architecture

Fine tuning of "bert-uncased" model with 70% train, 15% eval, 15% test datasets.

### Limitations
- Not optimized. I need to try to run it on CPU
- Little models have limitations. Regularly between 70-80% accuracy. Hard to go over just by changing params.

# Contacts :
*LinkedIn* : Mattéo GIRARDEAU
*email* : [email protected]
```