giuid commited on
Commit
a97e3a4
·
verified ·
1 Parent(s): d6c0ffa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ datasets:
4
+ - efra
5
+ license: apache-2.0
6
+ tags:
7
+ - summarization
8
+ - flan-t5
9
+ - legal
10
+ - food
11
+ model_type: t5
12
+ pipeline_tag: text2text-generation
13
+ ---
14
+
15
+ # Flan-T5 Large Fine-Tuned on EFRA Dataset
16
+
17
+ This is a fine-tuned version of [Flan-T5 XL](https://huggingface.co/google/flan-t5-xl) on the **EFRA dataset** for summarizing legal documents related to food regulations and policies.
18
+
19
+ ## Model Description
20
+
21
+ Flan-T5 is a sequence-to-sequence model trained for text-to-text tasks. This fine-tuned version is specifically optimized for summarizing legal text in the domain of food legislation, regulatory requirements, and compliance documents.
22
+
23
+ ### Fine-Tuning Details
24
+ - **Base Model**: [google/flan-t5-large](https://huggingface.co/google/flan-t5-large)
25
+ - **Dataset**: EFRA (a curated dataset of legal documents in the food domain)
26
+ - **Objective**: Summarization of legal documents
27
+ - **Framework**: Hugging Face Transformers
28
+
29
+ ## Applications
30
+
31
+ This model is suitable for:
32
+ - Summarizing legal texts in the food domain
33
+ - Extracting key information from lengthy regulatory documents
34
+ - Assisting legal professionals and food companies in understanding compliance requirements
35
+
36
+ ## Example Usage
37
+
38
+ ```python
39
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
40
+
41
+ # Load the model and tokenizer
42
+ model = AutoModelForSeq2SeqLM.from_pretrained("giuid/flan_t5_xl_summarization_v2")
43
+ tokenizer = AutoTokenizer.from_pretrained("giuid/flan_t5_xl_summarization_v2")
44
+
45
+ # Input text
46
+ input_text = "Your lengthy legal document text here..."
47
+
48
+ # Tokenize and generate summary
49
+ inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
50
+ outputs = model.generate(inputs.input_ids, max_length=150, num_beams=5, early_stopping=True)
51
+
52
+ # Decode summary
53
+ summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
54
+ print(summary)