File size: 2,472 Bytes
44388e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f3920c
44388e3
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
language: en
thumbnail: https://example.com/thumbnail.png
tags:
- paraphrasing
- T5
- text generation
- NLP
- transformers
license: mit
datasets:
- mteb/quora
metrics:
- accuracy
base_model:
- humarin/chatgpt_paraphraser_on_T5_base
library_name: transformers
---

# ChatGPT and T5 Base Paraphraser

This model is a fine-tuned version of the T5 transformer model designed for paraphrasing questions using the ChatGPT architecture.

## Model Description

The `chat_gpt_and_t5_base_paraphraser` model is trained to generate paraphrased versions of input questions by utilizing a sequence-to-sequence approach. The model leverages the T5 architecture and has been fine-tuned on the Quora Question-Answer dataset to improve its ability to create diverse and meaningful paraphrases.

## Intended Use

This model is intended for applications where paraphrasing of text is required, such as:

- Chatbots
- Question-answering systems
- Content generation
- Educational tools

## How to Use

To use the model, install the Hugging Face `transformers` library and follow these steps:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the model and tokenizer
model_name = "jaesani/chat_gpt_and_t5_base_paraphraser"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

def paraphrase(question, max_length=128):
    input_ids = tokenizer(f'paraphrase: {question}', return_tensors="pt", padding="longest", max_length=max_length, truncation=True).input_ids
    outputs = model.generate(input_ids, max_length=max_length)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
paraphrased_text = paraphrase("What are the best places to see in New York?")
print(paraphrased_text)
```

## Training Data
   The model was fine-tuned using the Quora Question-Answer Dataset, which consists of pairs of questions that may or may not be paraphrases of each other.

## Evaluation
   The model's performance can be evaluated based on the diversity and coherence of the paraphrases it generates. Specific metrics can include BLEU scores and human evaluations for semantic similarity.

## Limitations
  The model may produce paraphrases that are not contextually relevant.
    It may struggle with highly technical or domain-specific language.
    Generated paraphrases might be similar for closely related input questions.

## License
  This model is licensed under MIT License.