Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ Falcon-7b-chat-oasst1 is a chatbot-like model for dialogue generation. It was bu
|
|
10 |
This model was fine-tuned in 8-bit using 🤗 [peft](https://github.com/huggingface/peft) adapters, [transformers](https://github.com/huggingface/transformers), and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
|
11 |
- The training relied on a recent method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), instead of fine-tuning the entire model you just have to fine-tune adapters and load them properly inside the model.
|
12 |
- Training took approximately 6 hours and was executed on a workstation with a single NVIDIA A100-SXM 40GB GPU (via Google Colab).
|
13 |
-
- See attached [Notebook](https://huggingface.co/
|
14 |
|
15 |
## Model Summary
|
16 |
|
@@ -92,11 +92,6 @@ We recommend users of this model to develop guardrails and to take appropriate p
|
|
92 |
import torch
|
93 |
from peft import PeftModel, PeftConfig
|
94 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
95 |
-
|
96 |
-
# Login to HF
|
97 |
-
from huggingface_hub import notebook_login
|
98 |
-
|
99 |
-
notebook_login() # use personal HF token for access to intellio-nlp
|
100 |
```
|
101 |
|
102 |
### GPU Inference in 8-bit
|
@@ -105,7 +100,7 @@ This requires a GPU with at least 12GB memory.
|
|
105 |
|
106 |
```python
|
107 |
# load the model
|
108 |
-
peft_model_id = "
|
109 |
config = PeftConfig.from_pretrained(peft_model_id)
|
110 |
|
111 |
model = AutoModelForCausalLM.from_pretrained(
|
@@ -153,7 +148,7 @@ print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))
|
|
153 |
|
154 |
## Reproducibility
|
155 |
|
156 |
-
- See attached [Notebook](https://huggingface.co/
|
157 |
|
158 |
### CUDA Info
|
159 |
|
|
|
10 |
This model was fine-tuned in 8-bit using 🤗 [peft](https://github.com/huggingface/peft) adapters, [transformers](https://github.com/huggingface/transformers), and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
|
11 |
- The training relied on a recent method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), instead of fine-tuning the entire model you just have to fine-tune adapters and load them properly inside the model.
|
12 |
- Training took approximately 6 hours and was executed on a workstation with a single NVIDIA A100-SXM 40GB GPU (via Google Colab).
|
13 |
+
- See attached [Notebook](https://huggingface.co/dfurman/falcon-7b-chat-oasst1/blob/main/finetune_falcon7b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
|
14 |
|
15 |
## Model Summary
|
16 |
|
|
|
92 |
import torch
|
93 |
from peft import PeftModel, PeftConfig
|
94 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
|
|
|
|
|
|
|
95 |
```
|
96 |
|
97 |
### GPU Inference in 8-bit
|
|
|
100 |
|
101 |
```python
|
102 |
# load the model
|
103 |
+
peft_model_id = "dfurman/falcon-7b-chat-oasst1"
|
104 |
config = PeftConfig.from_pretrained(peft_model_id)
|
105 |
|
106 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
148 |
|
149 |
## Reproducibility
|
150 |
|
151 |
+
- See attached [Notebook](https://huggingface.co/dfurman/falcon-7b-chat-oasst1/blob/main/finetune_falcon7b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
|
152 |
|
153 |
### CUDA Info
|
154 |
|