File size: 8,880 Bytes
fbf9094 f09504f fbf9094 721ad7f fbf9094 721ad7f fbf9094 721ad7f fbf9094 721ad7f fbf9094 e8469d7 67ff4a0 e8469d7 86a16f1 14a6e42 e8469d7 67ff4a0 e8469d7 d6d625b e8469d7 67ff4a0 e8469d7 67ff4a0 e8469d7 67ff4a0 e8469d7 67ff4a0 e8469d7 67ff4a0 e8469d7 67ff4a0 e8469d7 67ff4a0 e8469d7 67ff4a0 e8469d7 67ff4a0 e8469d7 67ff4a0 e8469d7 67ff4a0 a389ab8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
---
license: mit
language:
- en
base_model:
- facebook/bart-large-cnn
pipeline_tag: summarization
model-index:
- name: summarize-reddit
results:
- task:
type: summarization
dataset:
name: custom reddit posts
type: custom
metrics:
- name: ROUGE-1
type: ROUGE
value:
recall: 32.20
precision: 22.03
f1-score: 25.00
- name: ROUGE-2
type: ROUGE
value:
recall: 7.10
precision: 4.90
f1-score: 5.51
- name: ROUGE-L
type: ROUGE
value:
recall: 30.09
precision: 20.50
f1-score: 23.30
- name: BERTScore
type: BERTScore
value:
precision: 0.8704
recall: 0.8517
f1-score: 0.8609
source:
name: summarize-reddit
url: https://huggingface.co/julsCadenas/summarize-reddit
---
# **Reddit Summarization Model**
This project uses a fine-tuned model for summarizing Reddit posts and their comments. The model has been trained using a dataset of 100 Reddit posts, and the goal is to generate concise and meaningful summaries of the original posts and the associated comments.
You can access the source code and more information about this project on GitHub: [GitHub Repository Link](https://github.com/julsCadenas/summarize-reddit)
## Model on Hugging Face
This project uses a fine-tuned version of the BART model from Facebook for summarizing Reddit posts and their comments. The original model, facebook/bart-large-cnn, is a pre-trained sequence-to-sequence model optimized for summarization tasks. It was fine-tuned on a custom Reddit dataset for this project.
- **Original Model:** [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn)
- **Fine-Tuned Model:** [julsCadenas/summarize-reddit](https://huggingface.co/julsCadenas/summarize-reddit)
## Installation
To get started, you need to install the required dependencies. You can do this by creating a virtual environment and installing the packages listed in `requirements.txt`.
### **Steps:**
1. Clone the repository:
```bash
git clone https://github.com/your-username/reddit-summarizer.git
cd reddit-summarizer
2. Set up a virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate # On Windows, use 'venv\Scripts\activate'
2. Install depdendencies:
```bash
pip install -r requirements.txt
3. Set up your environment variables (if needed) by creating a ```.env``` file. You can refer to the sample ```.env.example``` for the necessary variables.
### **Usage**
1. In ```src/summarize.py```, the model should be initialized like this:
```python
# src/summarize.py
self.summarizer = pipeline(
"summarization",
model = "julsCadenas/summarize-reddit",
tokenizer = "julsCadenas/summarize-reddit",
)
2. Add the *URL* of your preferred Reddit post on main.py.
3. Run ```src/main.py```
### **Formatted JSON Output**
The model outputs its responses in JSON format, which may not be fully formatted properly. For instance, the output could look like [this](https://github.com/julsCadenas/summarize-reddit/blob/master/data/test_output.json).
You can see that the output contains escaped quotes within the values. This data should be properly formatted for easier consumption. To fix this, you can use the following function to clean and format the JSON:
```python
def fix_json(raw_data, fixed_path):
if not isinstance(raw_data, dict):
raise ValueError(f"Expected a dictionary, but got: {type(raw_data)}")
try:
formatted_data = {
"post_summary": json.loads(raw_data["post_summary"]),
"comments_summary": json.loads(raw_data["comments_summary"])
}
except json.JSONDecodeError as e:
print("Error decoding JSON:", e)
return
with open(fixed_path, "w") as file:
json.dump(formatted_data, file, indent=4)
print(f"Formatted JSON saved to {fixed_path}")
```
After using the fix_json() function to clean and format the data, the data will now look like [this](https://github.com/julsCadenas/summarize-reddit/blob/master/data/formatted_output.json).
You can view the full notebook on formatting the output [here](https://github.com/julsCadenas/summarize-reddit/blob/master/notebooks/testing.ipynb).
<br>
# **Model Evaluation**
### For a detailed evaluation of the model, including additional analysis and visualizations, refer to the [evaluation notebook](https://github.com/julsCadenas/summarize-reddit/blob/master/notebooks/eval.ipynb).
## **BERTScore**
The model’s performance was evaluated using **BERTScore** (Precision, Recall, and F1).
### **Average BERTScores**
| Metric | Value |
|---------------------|-----------|
| **Precision (p)**| 0.8704 |
| **Recall (r)** | 0.8517 |
| **F1 Score (f)** | 0.8609 |
### **Conclusion**
- **Precision** is strong but can be improved by reducing irrelevant tokens.
- **Recall** needs improvement to capture more relevant content.
- **F1 Score** indicates a solid overall performance.
### **Improvements**
- Focus on improving **Recall**.
- Perform **Error Analysis** to identify missed content.
- **Fine-tune** the model for better results.
## **ROUGE**
The following table summarizes the ROUGE scores (Recall, Precision, and F1) for three different metrics: ROUGE-1, ROUGE-2, and ROUGE-L. These values represent the mean scores across all summaries.
### **Average ROUGE Scores**
| Metric | ROUGE-1 | ROUGE-2 | ROUGE-L |
|--------------|-----------|-----------|-----------|
| **Recall (r)** | 32.20 | 7.10 | 30.09 |
| **Precision (p)** | 22.03 | 4.90 | 20.50 |
| **F1 Score (f)** | 25.00 | 5.51 | 23.30 |
### **Interpretation**
- **ROUGE-1**: Shows higher recall and precision, indicating that the model is good at capturing single-word overlaps but could reduce irrelevant words.
- **ROUGE-2**: Exhibits lower recall and precision, indicating the model struggles with bigram relationships and context.
- **ROUGE-L**: Performs better than ROUGE-2 but still faces challenges with precision. It captures longer subsequences more effectively than bigrams.
## **Conclusion**
- **ROUGE-1**: The model shows moderate performance but generates some irrelevant words (low precision).
- **ROUGE-2**: The model performs poorly, indicating difficulty in capturing bigram relationships.
- **ROUGE-L**: Slightly better than ROUGE-2, with some success in capturing longer sequences.
## **Improvements**
- Focus on enhancing **bigram overlap** (ROUGE-2) and overall **context understanding**.
- Reduce **irrelevant content** for improved **precision**.
- Improve **sequence coherence** for better **ROUGE-L** scores.
## **METEOR Score**
| Metric | Meteor Score |
|-------------|--------------|
| **Mean** | 0.2079 |
| **Min** | 0.0915 |
| **Max** | 0.3216 |
| **STD** | 0.0769 |
### **Interpretation**
- **Mean**: The average METEOR score indicates good performance in terms of word alignment and synonyms, but there is still room for improvement.
- **Min**: The lowest METEOR score suggests some summaries may not align well with the reference.
- **Max**: The highest METEOR score shows the model's potential for generating very well-aligned summaries.
- **STD**: The standard deviation indicates some variability in the model's performance across different summaries.
### **Conclusion**
- The model's **METEOR Score** shows a generally solid performance in generating summaries that align well with reference content but still has variability in certain cases.
### **Improvements**
- Focus on improving the **alignment** and **synonym usage** to achieve higher and more consistent **METEOR scores** across summaries.
## **TLDR**
### **Comparison & Final Evaluation**
- **BERTScore** suggests the model is good at generating relevant tokens (precision) but struggles with capturing all relevant content (recall).
- **ROUGE-1** is decent, but **ROUGE-2** and **ROUGE-L** show weak performance, particularly in terms of bigram relationships and sequence coherence.
- **METEOR** results show solid alignment, but there’s significant variability, especially with lower scores.
### **Conclusion**
- The model performs decently but lacks consistency, especially in **bigram overlap** (ROUGE-2) and capturing **longer sequences** (ROUGE-L). There’s room for improvement in **recall** and **precision** to make the summaries more relevant and coherent.
- Focus on improving **recall**, **bigram relationships**, and **precision** to achieve more consistent, high-quality summaries. |