julsCadenas commited on
Commit
67ff4a0
·
verified ·
1 Parent(s): 86a16f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -95
README.md CHANGED
@@ -37,7 +37,7 @@ model-index:
37
  url: https://huggingface.co/julsCadenas/summarize-reddit
38
  ---
39
 
40
- # Reddit Summarization Model
41
 
42
  This project uses a fine-tuned model for summarizing Reddit posts and their comments. The model has been trained using a dataset of 100 Reddit posts, and the goal is to generate concise and meaningful summaries of the original posts and the associated comments.
43
 
@@ -50,110 +50,87 @@ This project uses a fine-tuned version of the BART model from Facebook for summa
50
  - **Original Model:** [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn)
51
  - **Fine-Tuned Model:** [julsCadenas/summarize-reddit](https://huggingface.co/julsCadenas/summarize-reddit)
52
 
53
- ## **Usage**
54
-
55
- You can use the model to summarize Reddit posts and comments using the following code:
56
- ```python
57
- from transformers import pipeline
58
-
59
- class Summarize:
60
- def __init__(self):
61
- self.summarizer = pipeline(
62
- "summarization",
63
- model = "julsCadenas/summarize-reddit",
64
- tokenizer = "julsCadenas/summarize-reddit",
65
- )
66
-
67
- def summarize(self, text, prompt):
68
- inputs = f"{prompt}: {text}"
69
- input_tokens = self.summarizer.tokenizer.encode(inputs, truncation=False)
70
- input_len = len(input_tokens)
71
- max_length = min(input_len * 2, 1024) # change depending on your use case
72
- min_length = max(32, input_len // 4) # change depending on your use case
73
- summary = self.summarizer(
74
- inputs,
75
- max_length=max_length,
76
- min_length=min_length,
77
- length_penalty=2.0,
78
- num_beams=4,
79
- )
80
- return summary[0]['summary_text']
81
-
82
- def process_data(self, response, prompt):
83
- post_content = response[0]['data']['children'][0]['data'].get('selftext', '')
84
- comments = []
85
- for comment in response[1]['data']['children']:
86
- if 'body' in comment['data']:
87
- comments.append(comment['data']['body'])
88
- comments_all = ' '.join(comments)
89
-
90
- post_summary = self.summarize(post_content, prompt)
91
- comments_summary = self.summarize(comments_all, prompt)
92
-
93
- return {
94
- "post_summary": post_summary,
95
- "comments_summary": comments_summary
96
- }
97
- ```
98
-
99
- You can also use a script to format the JSON:
100
- ```python
101
- def fix_json(jsonfile, path):
102
- improper_json = jsonfile
103
-
104
- fixed_json = json.loads(improper_json)
105
-
106
- fixed_post_summary = json.loads(fixed_json['post_summary'])
107
- fixed_comments_summary = json.loads(fixed_json['comments_summary'])
108
-
109
- fixed_json['post_summary'] = fixed_post_summary
110
- fixed_json['comments_summary'] = fixed_comments_summary
111
-
112
- print(json.dumps(fixed_json, indent=4))
113
-
114
- with open(path, 'w') as file:
115
- json.dump(fixed_json, file, indent=4)
116
- ```
117
-
118
- ## **Model Evaluation**
119
-
120
- ### **ROGUE-1 SCORES:**
121
- - **Recall (r)** = 57.66%
122
- - **Precision (p)** = 43.41%
123
- - **F1-Score (f)** = 49.53%
124
-
125
- ### **ROUGE-2 SCORES:**
126
- - **Recall (r)** = 29.30%
127
- - **Precision (p)** = 20.72%
128
- - **F1-Score (f)** = 24.27%
129
-
130
- ### **ROUGE-L SCORES:**
131
- - **Recall (r)** = 56.20%
132
- - **Precision (p)** = 42.30%
133
- - **F1-Score (f)** = 48.28%
134
 
135
  <br>
136
 
137
- **ROUGE-1:** also known as unigram, measures the overlap of unigrams (individual words) between the generated summary and the reference summary. It calculates the proportion of words in the generated summary that are also present in the reference summary. Example: Reference text: “The cat is on the rug” Generated text: “The dog is on the rug” ROUGE-1 = 3/5 = 0.6.
138
 
139
- **ROUGE-2:** also known as bigram, measures the overlap of bigrams (pairs of consecutive words) between the generated summary and the reference summary. It calculates the proportion of bigrams in the generated summary that are also present in the reference summary.
140
 
141
- **ROUGE-L:** measures the similarity between the word sequence of the generated abstract and the reference abstract using the longest sequence of words in common. Unlike ROUGE-1 and ROUGE-2, which use a simple word count approach, ROUGE-L uses a string matching approach.
142
 
143
- **Source:** https://fabianofalcao.medium.com/metrics-for-evaluating-summarization-of-texts-performed-by-transformers-how-to-evaluate-the-b3ce68a309c3
144
 
145
- ### **CONCLUSIONS**
146
- **ROGUE-1 SCORE:** The ROUGE-1 score indicates that the model is fairly good at capturing individual words (unigrams) from the reference summaries. With a recall of 57.66%, the model captures more than half of the relevant words from the reference summaries, which suggests that it's effectively capturing key content. The precision of 43.41% indicates that a significant portion of the generated summary’s words also appear in the reference, but there may be some additional, irrelevant words included. The F1-score of 49.53% shows that, overall, there’s a fairly good balance between recall and precision, although there's still room to increase both aspects for better results.
 
 
 
 
147
 
148
- **ROGUE-2 SCORE:** The ROUGE-2 score, which focuses on bigram overlap, is considerably lower than ROUGE-1. The recall of 29.30% indicates that the model captures roughly 30% of the bigrams from the reference summaries, which is a moderate result but suggests that the model may not be fully preserving the structural relationships between words. The precision of 20.72% suggests that the generated summaries might include bigrams that are not present in the reference summaries. The F1-score of 24.27% is relatively low, which may indicate that the model needs improvement in capturing bigram patterns in the summaries. This is common in summarization tasks, as producing high-quality bigram overlap is challenging.
 
 
 
149
 
150
- **ROGUE-l SCORE:** The ROUGE-L score, which focuses on the longest common subsequence (LCS), shows that the model is able to capture the overall structure of the reference summaries quite well. The recall of 56.20% suggests that a large portion of the key sequences (order-preserving) from the reference summaries appear in the generated summaries, indicating good coherence. The precision of 42.30% shows that the model does well in maintaining relevant sequences in the generated summary but could further reduce redundant or non-informative sequences. The F1-score of 48.28% indicates a solid performance in preserving the flow and structure of the original text.
 
 
 
151
 
152
- ### **SUMMARY**
153
- - The ROUGE-1 score is strong, indicating that the model is capturing individual words well, which is important for summarizing the key points of Reddit posts. This suggests the model is effectively identifying the core content from the original Reddit discussions.
154
 
155
- - The ROUGE-2 score is relatively low, suggesting that the model struggles with preserving the structure and sequence of words, which is crucial for generating coherent summaries of Reddit posts where sentence flow and the connection between ideas are important.
156
 
157
- - The ROUGE-L score shows that the model is effectively capturing meaningful sequences and maintaining coherence in the summaries. This is a positive outcome for summarizing Reddit posts, where keeping the overall message and flow intact is important.
 
 
 
 
 
158
 
159
- While the model performs well in certain areas (particularly with ROUGE-1 and ROUGE-L), there is room for improvement, especially with the ROUGE-2 score. Improving bigram overlap could enhance the fluency and structure of the summaries, leading to more readable and coherent summaries of Reddit posts.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  url: https://huggingface.co/julsCadenas/summarize-reddit
38
  ---
39
 
40
+ # **Reddit Summarization Model**
41
 
42
  This project uses a fine-tuned model for summarizing Reddit posts and their comments. The model has been trained using a dataset of 100 Reddit posts, and the goal is to generate concise and meaningful summaries of the original posts and the associated comments.
43
 
 
50
  - **Original Model:** [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn)
51
  - **Fine-Tuned Model:** [julsCadenas/summarize-reddit](https://huggingface.co/julsCadenas/summarize-reddit)
52
 
53
+ ## Installation
54
+
55
+ To get started, you need to install the required dependencies. You can do this by creating a virtual environment and installing the packages listed in `requirements.txt`.
56
+
57
+ ### **Steps:**
58
+
59
+ 1. Clone the repository:
60
+ ```bash
61
+ git clone https://github.com/your-username/reddit-summarizer.git
62
+ cd reddit-summarizer
63
+ 2. Set up a virtual environment:
64
+ ```bash
65
+ python3 -m venv venv
66
+ source venv/bin/activate # On Windows, use 'venv\Scripts\activate'
67
+ 2. Install depdendencies:
68
+ ```bash
69
+ pip install -r requirements.txt
70
+ 3. Set up your environment variables (if needed) by creating a ```.env``` file. You can refer to the sample ```.env.example``` for the necessary variables.
71
+
72
+ ### **Usage**
73
+
74
+ 1. In ```src/summarize.py```, the model should be initialized like this:
75
+ ```python
76
+ # src/summarize.py
77
+ self.summarizer = pipeline(
78
+ "summarization",
79
+ model = "julsCadenas/summarize-reddit",
80
+ tokenizer = "julsCadenas/summarize-reddit",
81
+ )
82
+ 2. Add the *URL* of your preferred Reddit post on main.py.
83
+ 3. Run ```src/main.py```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  <br>
86
 
87
+ # **Model Evaluation**
88
 
89
+ ### For a detailed evaluation of the model, including additional analysis and visualizations, refer to the [evaluation notebook](https://github.com/julsCadenas/summarize-reddit/blob/master/notebooks/eval.ipynb).
90
 
91
+ ## **BERTScore**
92
 
93
+ The model’s performance was evaluated using **BERTScore** (Precision, Recall, and F1).
94
 
95
+ ### **Average BERTScores**
96
+ | Metric | Value |
97
+ |---------------------|-----------|
98
+ | **Precision (p)**| 0.8704 |
99
+ | **Recall (r)** | 0.8517 |
100
+ | **F1 Score (f)** | 0.8609 |
101
 
102
+ ### **Conclusion**
103
+ - **Precision** is strong but can be improved by reducing irrelevant tokens.
104
+ - **Recall** needs improvement to capture more relevant content.
105
+ - **F1 Score** indicates a solid overall performance.
106
 
107
+ ### **Improvements**
108
+ - Focus on improving **Recall**.
109
+ - Perform **Error Analysis** to identify missed content.
110
+ - **Fine-tune** the model for better results.
111
 
112
+ ## **ROUGE**
 
113
 
114
+ The following table summarizes the ROUGE scores (Recall, Precision, and F1) for three different metrics: ROUGE-1, ROUGE-2, and ROUGE-L. These values represent the mean scores across all summaries.
115
 
116
+ ### **Average ROUGE Scores**
117
+ | Metric | ROUGE-1 | ROUGE-2 | ROUGE-L |
118
+ |--------------|-----------|-----------|-----------|
119
+ | **Recall (r)** | 32.20 | 7.10 | 30.09 |
120
+ | **Precision (p)** | 22.03 | 4.90 | 20.50 |
121
+ | **F1 Score (f)** | 25.00 | 5.51 | 23.30 |
122
 
123
+ ### **Interpretation**
124
+ - **ROUGE-1**: Shows higher recall and precision, indicating that the model is good at capturing single-word overlaps but could reduce irrelevant words.
125
+ - **ROUGE-2**: Exhibits lower recall and precision, indicating the model struggles with bigram relationships and context.
126
+ - **ROUGE-L**: Performs better than ROUGE-2 but still faces challenges with precision. It captures longer subsequences more effectively than bigrams.
127
+
128
+ ## **Conclusion**
129
+ - **ROUGE-1**: The model shows moderate performance but generates some irrelevant words (low precision).
130
+ - **ROUGE-2**: The model performs poorly, indicating difficulty in capturing bigram relationships.
131
+ - **ROUGE-L**: Slightly better than ROUGE-2, with some success in capturing longer sequences.
132
+
133
+ ## **Improvements**
134
+ - Focus on enhancing **bigram overlap** (ROUGE-2) and overall **context understanding**.
135
+ - Reduce **irrelevant content** for improved **precision**.
136
+ - Improve **sequence coherence** for better **ROUGE-L** scores.