Update README.md
Browse files
README.md
CHANGED
@@ -8,11 +8,26 @@ library_name: peft
|
|
8 |
pipeline_tag: text2text-generation
|
9 |
---
|
10 |
|
11 |
-
This is a
|
12 |
-
|
13 |
|
14 |
The model uses custom tokens to delimit triplets:
|
15 |
```
|
16 |
special_tokens = ['<triplet>', '</triplet>', '<relation>', '<object>']
|
17 |
tokenizer.add_tokens(special_tokens)
|
18 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
pipeline_tag: text2text-generation
|
9 |
---
|
10 |
|
11 |
+
This is a version of `flan-t5-xl` fine-tuned on the [KELM Corpus](https://github.com/google-research-datasets/KELM-corpus) to take in sentences and output triplets of the form `subject-relation-object` to be used for knowledge graph generation.
|
|
|
12 |
|
13 |
The model uses custom tokens to delimit triplets:
|
14 |
```
|
15 |
special_tokens = ['<triplet>', '</triplet>', '<relation>', '<object>']
|
16 |
tokenizer.add_tokens(special_tokens)
|
17 |
+
```
|
18 |
+
|
19 |
+
You can use it like this:
|
20 |
+
```
|
21 |
+
model = model.to(device)
|
22 |
+
model.eval()
|
23 |
+
|
24 |
+
new_input = "Hugging Face, Inc. is an American company that develops tools for building applications using machine learning.",
|
25 |
+
inputs = tokenizer(new_input, return_tensors="pt")
|
26 |
+
|
27 |
+
with torch.no_grad():
|
28 |
+
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"))
|
29 |
+
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=False)[0])
|
30 |
+
```
|
31 |
+
Output: `<pad><triplet> Hugging Face <relation> instance of <object> Business </triplet></s>`
|
32 |
+
|
33 |
+
This model still isn't perfect, and may make mistakes! I'm working on fine-tuning it for longer and on a more diverse set of data.
|