TheBloke commited on
Commit
afc6fb6
·
1 Parent(s): 8bf38a7

Initial GPTQ model commit

Browse files
Files changed (1) hide show
  1. README.md +264 -0
README.md ADDED
@@ -0,0 +1,264 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ license: other
4
+ ---
5
+
6
+ <!-- header start -->
7
+ <div style="width: 100%;">
8
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
9
+ </div>
10
+ <div style="display: flex; justify-content: space-between; width: 100%;">
11
+ <div style="display: flex; flex-direction: column; align-items: flex-start;">
12
+ <p><a href="https://discord.gg/theblokeai">Chat & support: my new Discord server</a></p>
13
+ </div>
14
+ <div style="display: flex; flex-direction: column; align-items: flex-end;">
15
+ <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
16
+ </div>
17
+ </div>
18
+ <!-- header end -->
19
+
20
+ # NousResearch's Redmond Hermes Coder GPTQ
21
+
22
+ These files are GPTQ 4bit model files for [NousResearch's Redmond Hermes Coder](https://huggingface.co/NousResearch/Redmond-Hermes-Coder).
23
+
24
+ It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
25
+
26
+ ## Repositories available
27
+
28
+ * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/Redmond-Hermes-Coder-GPTQ)
29
+ * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/Redmond-Hermes-Coder-GGML)
30
+ * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/NousResearch/Redmond-Hermes-Coder)
31
+
32
+ ## Prompt template: Alpaca
33
+
34
+ ```
35
+ Below is an instruction that describes a task. Write a response that appropriately completes the request.
36
+
37
+ ### Instruction: PROMPT
38
+
39
+ ### Response:
40
+
41
+ ```
42
+
43
+ ## How to easily download and use this model in text-generation-webui
44
+
45
+ Please make sure you're using the latest version of text-generation-webui
46
+
47
+ 1. Click the **Model tab**.
48
+ 2. Under **Download custom model or LoRA**, enter `TheBloke/Redmond-Hermes-Coder-GPTQ`.
49
+ 3. Click **Download**.
50
+ 4. The model will start downloading. Once it's finished it will say "Done"
51
+ 5. In the top left, click the refresh icon next to **Model**.
52
+ 6. In the **Model** dropdown, choose the model you just downloaded: `Redmond-Hermes-Coder-GPTQ`
53
+ 7. The model will automatically load, and is now ready for use!
54
+ 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
55
+ * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
56
+ 9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
57
+
58
+ ## How to use this GPTQ model from Python code
59
+
60
+ First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
61
+
62
+ `GITHUB_ACTIONS=true pip install auto-gptq`
63
+
64
+ Then try the following example code:
65
+
66
+ ```python
67
+ from transformers import AutoTokenizer, pipeline, logging
68
+ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
69
+ import argparse
70
+
71
+ model_name_or_path = "TheBloke/Redmond-Hermes-Coder-GPTQ"
72
+ model_basename = "gptq_model-4bit-128g"
73
+
74
+ use_triton = False
75
+
76
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
77
+
78
+ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
79
+ model_basename=model_basename,
80
+ use_safetensors=True,
81
+ trust_remote_code=False,
82
+ device="cuda:0",
83
+ use_triton=use_triton,
84
+ quantize_config=None)
85
+
86
+ prompt = "Tell me about AI"
87
+ prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.
88
+
89
+ ### Instruction: PROMPT
90
+
91
+ ### Response:
92
+
93
+ '''
94
+
95
+ print("\n\n*** Generate:")
96
+
97
+ input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
98
+ output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
99
+ print(tokenizer.decode(output[0]))
100
+
101
+ # Inference can also be done using transformers' pipeline
102
+
103
+ # Prevent printing spurious transformers error when using pipeline with AutoGPTQ
104
+ logging.set_verbosity(logging.CRITICAL)
105
+
106
+ print("*** Pipeline:")
107
+ pipe = pipeline(
108
+ "text-generation",
109
+ model=model,
110
+ tokenizer=tokenizer,
111
+ max_new_tokens=512,
112
+ temperature=0.7,
113
+ top_p=0.95,
114
+ repetition_penalty=1.15
115
+ )
116
+
117
+ print(pipe(prompt_template)[0]['generated_text'])
118
+ ```
119
+
120
+ ## Provided files
121
+
122
+ **gptq_model-4bit-128g.safetensors**
123
+
124
+ This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
125
+
126
+ If a Llama model, it will also be supported by ExLlama, which will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
127
+
128
+ It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
129
+
130
+ * `gptq_model-4bit-128g.safetensors`
131
+ * Works with AutoGPTQ in CUDA or Triton modes.
132
+ * [ExLlama](https://github.com/turboderp/exllama) suupports Llama 4-bit GPTQs, and will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
133
+ * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
134
+ * Works with text-generation-webui, including one-click-installers.
135
+ * Parameters: Groupsize = 128. Act Order / desc_act = False.
136
+
137
+ <!-- footer start -->
138
+ ## Discord
139
+
140
+ For further support, and discussions on these models and AI in general, join us at:
141
+
142
+ [TheBloke AI's Discord server](https://discord.gg/theblokeai)
143
+
144
+ ## Thanks, and how to contribute.
145
+
146
+ Thanks to the [chirper.ai](https://chirper.ai) team!
147
+
148
+ I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
149
+
150
+ If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
151
+
152
+ Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
153
+
154
+ * Patreon: https://patreon.com/TheBlokeAI
155
+ * Ko-Fi: https://ko-fi.com/TheBlokeAI
156
+
157
+ **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
158
+
159
+ **Patreon special mentions**: zynix , ya boyyy, Trenton Dambrowitz, Imad Khwaja, Alps Aficionado, chris gileta, John Detwiler, Willem Michiel, RoA, Mano Prime, Rainer Wilmers, Fred von Graf, Matthew Berman, Ghost , Nathan LeClaire, Iucharbius , Ai Maven, Illia Dulskyi, Joseph William Delisle, Space Cruiser, Lone Striker, Karl Bernard, Eugene Pentland, Greatston Gnanesh, Jonathan Leane, Randy H, Pierre Kircher, Willian Hasse, Stephen Murray, Alex , terasurfer , Edmond Seymore, Oscar Rangel, Luke Pendergrass, Asp the Wyvern, Junyu Yang, David Flickinger, Luke, Spiking Neurons AB, subjectnull, Pyrater, Nikolai Manek, senxiiz, Ajan Kanaga, Johann-Peter Hartmann, Artur Olbinski, Kevin Schuppel, Derek Yates, Kalila, K, Talal Aujan, Khalefa Al-Ahmad, Gabriel Puliatti, John Villwock, WelcomeToTheClub, Daniel P. Andersen, Preetika Verma, Deep Realms, Fen Risland, trip7s trip, webtim, Sean Connelly, Michael Levine, Chris McCloskey, biorpg, vamX, Viktor Bowallius, Cory Kujawski.
160
+
161
+ Thank you to all my generous patrons and donaters!
162
+
163
+ <!-- footer end -->
164
+
165
+ # Original model card: NousResearch's Redmond Hermes Coder
166
+
167
+
168
+ # Model Card: Redmond-Hermes-Coder 15B
169
+
170
+ ## Model Description
171
+
172
+ Redmond-Hermes-Coder 15B is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium and Karan4D leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors.
173
+
174
+ This model was trained with a WizardCoder base, which itself uses a StarCoder base model.
175
+
176
+ The model is truly great at code, but, it does come with a tradeoff though. While far better at code than the original Nous-Hermes built on Llama, it is worse than WizardCoder at pure code benchmarks, like HumanEval.
177
+
178
+ It comes in at 39% on HumanEval, with WizardCoder at 57%. This is a preliminary experiment, and we are exploring improvements now.
179
+
180
+ However, it does seem better at non-code than WizardCoder on a variety of things, including writing tasks.
181
+
182
+ ## Model Training
183
+
184
+ The model was trained almost entirely on synthetic GPT-4 outputs. This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), CodeAlpaca, Evol_Instruct Uncensored, GPT4-LLM, and Unnatural Instructions.
185
+
186
+ Additional data inputs came from Camel-AI's Biology/Physics/Chemistry and Math Datasets, Airoboros' (v1) GPT-4 Dataset, and more from CodeAlpaca. The total volume of data encompassed over 300,000 instructions.
187
+
188
+ ## Collaborators
189
+ The model fine-tuning and the datasets were a collaboration of efforts and resources from members of Nous Research, includingTeknium, Karan4D, Huemin Art, and Redmond AI's generous compute grants.
190
+
191
+ Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.
192
+
193
+ Among the contributors of datasets, GPTeacher was made available by Teknium, Wizard LM by nlpxucan, and the Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
194
+ The GPT4-LLM and Unnatural Instructions were provided by Microsoft, Airoboros dataset by jondurbin, Camel-AI datasets are from Camel-AI, and CodeAlpaca dataset by Sahil 2801.
195
+ If anyone was left out, please open a thread in the community tab.
196
+
197
+ ## Prompt Format
198
+
199
+ The model follows the Alpaca prompt format:
200
+ ```
201
+ ### Instruction:
202
+
203
+ ### Response:
204
+ ```
205
+
206
+ or
207
+
208
+ ```
209
+ ### Instruction:
210
+
211
+ ### Input:
212
+
213
+ ### Response:
214
+ ```
215
+
216
+ ## Resources for Applied Use Cases:
217
+ For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord
218
+ For an example of a roleplaying discord bot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot
219
+
220
+ ## Future Plans
221
+ The model is currently being uploaded in FP16 format, and there are plans to convert the model to GGML and GPTQ 4bit quantizations. The team is also working on a full benchmark, similar to what was done for GPT4-x-Vicuna. We will try to get in discussions to get the model included in the GPT4All.
222
+
223
+ ## Benchmark Results
224
+ ```
225
+ HumanEval: 39%
226
+ | Task |Version| Metric |Value | |Stderr|
227
+ |------------------------------------------------|------:|---------------------|-----:|---|-----:|
228
+ |arc_challenge | 0|acc |0.2858|± |0.0132|
229
+ | | |acc_norm |0.3148|± |0.0136|
230
+ |arc_easy | 0|acc |0.5349|± |0.0102|
231
+ | | |acc_norm |0.5097|± |0.0103|
232
+ |bigbench_causal_judgement | 0|multiple_choice_grade|0.5158|± |0.0364|
233
+ |bigbench_date_understanding | 0|multiple_choice_grade|0.5230|± |0.0260|
234
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3295|± |0.0293|
235
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|0.1003|± |0.0159|
236
+ | | |exact_str_match |0.0000|± |0.0000|
237
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2260|± |0.0187|
238
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.1957|± |0.0150|
239
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.3733|± |0.0280|
240
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|0.3200|± |0.0209|
241
+ |bigbench_navigate | 0|multiple_choice_grade|0.4830|± |0.0158|
242
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.4150|± |0.0110|
243
+ |bigbench_ruin_names | 0|multiple_choice_grade|0.2143|± |0.0194|
244
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2926|± |0.0144|
245
+ |bigbench_snarks | 0|multiple_choice_grade|0.5249|± |0.0372|
246
+ |bigbench_sports_understanding | 0|multiple_choice_grade|0.4817|± |0.0159|
247
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|0.2700|± |0.0140|
248
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.1864|± |0.0110|
249
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1349|± |0.0082|
250
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.3733|± |0.0280|
251
+ |boolq | 1|acc |0.5498|± |0.0087|
252
+ |hellaswag | 0|acc |0.3814|± |0.0048|
253
+ | | |acc_norm |0.4677|± |0.0050|
254
+ |openbookqa | 0|acc |0.1960|± |0.0178|
255
+ | | |acc_norm |0.3100|± |0.0207|
256
+ |piqa | 0|acc |0.6600|± |0.0111|
257
+ | | |acc_norm |0.6610|± |0.0110|
258
+ |winogrande | 0|acc |0.5343|± |0.0140|
259
+ ```
260
+
261
+ ## Model Usage
262
+ The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
263
+
264
+ Compute provided by our project sponsor Redmond AI, thank you!!