bew commited on
Commit
486de62
·
1 Parent(s): 1711062

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -3
README.md CHANGED
@@ -8,11 +8,26 @@ library_name: peft
8
  pipeline_tag: text2text-generation
9
  ---
10
 
11
- This is a model trained on the [KELM Corpus](https://github.com/google-research-datasets/KELM-corpus) to take in sentences and output triplets of the form `subject-relation-object` to be used for knowledge graph generation.
12
-
13
 
14
  The model uses custom tokens to delimit triplets:
15
  ```
16
  special_tokens = ['<triplet>', '</triplet>', '<relation>', '<object>']
17
  tokenizer.add_tokens(special_tokens)
18
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  pipeline_tag: text2text-generation
9
  ---
10
 
11
+ This is a version of `flan-t5-xl` fine-tuned on the [KELM Corpus](https://github.com/google-research-datasets/KELM-corpus) to take in sentences and output triplets of the form `subject-relation-object` to be used for knowledge graph generation.
 
12
 
13
  The model uses custom tokens to delimit triplets:
14
  ```
15
  special_tokens = ['<triplet>', '</triplet>', '<relation>', '<object>']
16
  tokenizer.add_tokens(special_tokens)
17
+ ```
18
+
19
+ You can use it like this:
20
+ ```
21
+ model = model.to(device)
22
+ model.eval()
23
+
24
+ new_input = "Hugging Face, Inc. is an American company that develops tools for building applications using machine learning.",
25
+ inputs = tokenizer(new_input, return_tensors="pt")
26
+
27
+ with torch.no_grad():
28
+ outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"))
29
+ print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=False)[0])
30
+ ```
31
+ Output: `<pad><triplet> Hugging Face <relation> instance of <object> Business </triplet></s>`
32
+
33
+ This model still isn't perfect, and may make mistakes! I'm working on fine-tuning it for longer and on a more diverse set of data.