shahin-as/bart-large-sentence-compression

Fine-Tuned BART-Large for Sentence Compression

Model Overview

This model is a fine-tuned version of facebook/bart-large trained on the sentence-transformers/sentence-compression dataset. The goal of this model is to generate compressed versions of input sentences while maintaining fluency and meaning.

Training Details

Base Model: facebook/bart-large

Dataset: sentence-transformers/sentence-compression

Batch Size: 8

Epochs: 5

Learning Rate: 2e-5

Weight Decay: 0.01

Evaluation Metric for Best Model: SARI Penalized

Precision Mode: FP16 for efficient training

Evaluation Results

Validation Set Performance:

Metric	Score
SARI	89.68
SARI Penalized	88.42
ROUGE-1	93.05
ROUGE-2	88.47
ROUGE-L	92.98

Test Set Performance:

Metric	Score
SARI	89.76
SARI Penalized	88.32
ROUGE-1	93.14
ROUGE-2	88.65
ROUGE-L	93.07

Training Loss Curve

The loss curves during training are visualized in bart-large-sentence-compression_loss.eps, showing both training and evaluation loss over steps.

Usage

Load the Model

from transformers import BartForConditionalGeneration, BartTokenizer

model_name = "shahin-as/bart-large-sentence-compression"

model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = BartTokenizer.from_pretrained(model_name)

def compress_sentence(sentence):
    inputs = tokenizer(sentence, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = model.generate(**inputs, max_length=50, num_beams=5, length_penalty=2.0, early_stopping=True)
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

# Example usage
sentence = "Insert the sentence to be compressed here."
compressed_sentence = compress_sentence(sentence)
print("Original:", sentence)
print("Compressed:", compressed_sentence)

shahin-as
/

bart-large-sentence-compression