TheBloke
/

Redmond-Hermes-Coder-GPTQ

+---
+inference: false
+license: other
+---
+<!-- header start -->
+<div style="width: 100%;">
+    <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
+</div>
+<div style="display: flex; justify-content: space-between; width: 100%;">
+    <div style="display: flex; flex-direction: column; align-items: flex-start;">
+        <p><a href="https://discord.gg/theblokeai">Chat & support: my new Discord server</a></p>
+    </div>
+    <div style="display: flex; flex-direction: column; align-items: flex-end;">
+        <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
+    </div>
+</div>
+<!-- header end -->
+# NousResearch's Redmond Hermes Coder GPTQ
+These files are GPTQ 4bit model files for [NousResearch's Redmond Hermes Coder](https://huggingface.co/NousResearch/Redmond-Hermes-Coder).
+It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
+## Repositories available
+* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/Redmond-Hermes-Coder-GPTQ)
+* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/Redmond-Hermes-Coder-GGML)
+* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/NousResearch/Redmond-Hermes-Coder)
+## Prompt template: Alpaca
+```
+Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction: PROMPT
+### Response:
+```
+## How to easily download and use this model in text-generation-webui
+Please make sure you're using the latest version of text-generation-webui
+1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter `TheBloke/Redmond-Hermes-Coder-GPTQ`.
+3. Click **Download**.
+4. The model will start downloading. Once it's finished it will say "Done"
+5. In the top left, click the refresh icon next to **Model**.
+6. In the **Model** dropdown, choose the model you just downloaded: `Redmond-Hermes-Coder-GPTQ`
+7. The model will automatically load, and is now ready for use!
+8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
+  * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
+9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
+## How to use this GPTQ model from Python code
+First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
+`GITHUB_ACTIONS=true pip install auto-gptq`
+Then try the following example code:
+```python
+from transformers import AutoTokenizer, pipeline, logging
+from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+import argparse
+model_name_or_path = "TheBloke/Redmond-Hermes-Coder-GPTQ"
+model_basename = "gptq_model-4bit-128g"
+use_triton = False
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
+model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
+        model_basename=model_basename,
+        use_safetensors=True,
+        trust_remote_code=False,
+        device="cuda:0",
+        use_triton=use_triton,
+        quantize_config=None)
+prompt = "Tell me about AI"
+prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction: PROMPT
+### Response:
+'''
+print("\n\n*** Generate:")
+input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
+output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
+print(tokenizer.decode(output[0]))
+# Inference can also be done using transformers' pipeline
+# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
+logging.set_verbosity(logging.CRITICAL)
+print("*** Pipeline:")
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+    max_new_tokens=512,
+    temperature=0.7,
+    top_p=0.95,
+    repetition_penalty=1.15
+)
+print(pipe(prompt_template)[0]['generated_text'])
+```
+## Provided files
+**gptq_model-4bit-128g.safetensors**
+This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
+If a Llama model, it will also be supported by ExLlama, which will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
+It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
+* `gptq_model-4bit-128g.safetensors`
+  * Works with AutoGPTQ in CUDA or Triton modes.
+  * [ExLlama](https://github.com/turboderp/exllama) suupports Llama 4-bit GPTQs, and will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
+  * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
+  * Works with text-generation-webui, including one-click-installers.
+  * Parameters: Groupsize = 128. Act Order / desc_act = False.
+<!-- footer start -->
+## Discord
+For further support, and discussions on these models and AI in general, join us at:
+[TheBloke AI's Discord server](https://discord.gg/theblokeai)
+## Thanks, and how to contribute.
+Thanks to the [chirper.ai](https://chirper.ai) team!
+I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
+If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
+Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
+* Patreon: https://patreon.com/TheBlokeAI
+* Ko-Fi: https://ko-fi.com/TheBlokeAI
+**Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
+**Patreon special mentions**: zynix , ya boyyy, Trenton Dambrowitz, Imad Khwaja, Alps Aficionado, chris gileta, John Detwiler, Willem Michiel, RoA, Mano Prime, Rainer Wilmers, Fred von Graf, Matthew Berman, Ghost , Nathan LeClaire, Iucharbius , Ai Maven, Illia Dulskyi, Joseph William Delisle, Space Cruiser, Lone Striker, Karl Bernard, Eugene Pentland, Greatston Gnanesh, Jonathan Leane, Randy H, Pierre Kircher, Willian Hasse, Stephen Murray, Alex , terasurfer , Edmond Seymore, Oscar Rangel, Luke Pendergrass, Asp the Wyvern, Junyu Yang, David Flickinger, Luke, Spiking Neurons AB, subjectnull, Pyrater, Nikolai Manek, senxiiz, Ajan Kanaga, Johann-Peter Hartmann, Artur Olbinski, Kevin Schuppel, Derek Yates, Kalila, K, Talal Aujan, Khalefa Al-Ahmad, Gabriel Puliatti, John Villwock, WelcomeToTheClub, Daniel P. Andersen, Preetika Verma, Deep Realms, Fen Risland, trip7s trip, webtim, Sean Connelly, Michael Levine, Chris McCloskey, biorpg, vamX, Viktor Bowallius, Cory Kujawski.
+Thank you to all my generous patrons and donaters!
+<!-- footer end -->
+# Original model card: NousResearch's Redmond Hermes Coder
+# Model Card: Redmond-Hermes-Coder 15B
+## Model Description
+Redmond-Hermes-Coder 15B is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium and Karan4D leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors.
+This model was trained with a WizardCoder base, which itself uses a StarCoder base model.
+The model is truly great at code, but, it does come with a tradeoff though. While far better at code than the original Nous-Hermes built on Llama, it is worse than WizardCoder at pure code benchmarks, like HumanEval.
+It comes in at 39% on HumanEval, with WizardCoder at 57%. This is a preliminary experiment, and we are exploring improvements now.
+However, it does seem better at non-code than WizardCoder on a variety of things, including writing tasks.
+## Model Training
+The model was trained almost entirely on synthetic GPT-4 outputs. This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), CodeAlpaca, Evol_Instruct Uncensored, GPT4-LLM, and Unnatural Instructions.
+Additional data inputs came from Camel-AI's Biology/Physics/Chemistry and Math Datasets, Airoboros' (v1) GPT-4 Dataset, and more from CodeAlpaca. The total volume of data encompassed over 300,000 instructions.
+## Collaborators
+The model fine-tuning and the datasets were a collaboration of efforts and resources from members of Nous Research, includingTeknium, Karan4D, Huemin Art, and Redmond AI's generous compute grants.
+Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.
+Among the contributors of datasets, GPTeacher was made available by Teknium, Wizard LM by nlpxucan, and the Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
+The GPT4-LLM and Unnatural Instructions were provided by Microsoft, Airoboros dataset by jondurbin, Camel-AI datasets are from Camel-AI, and CodeAlpaca dataset by Sahil 2801.
+If anyone was left out, please open a thread in the community tab.
+## Prompt Format
+The model follows the Alpaca prompt format:
+```
+### Instruction:
+### Response:
+```
+or
+```
+### Instruction:
+### Input:
+### Response:
+```
+## Resources for Applied Use Cases:
+For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord
+For an example of a roleplaying discord bot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot
+## Future Plans
+The model is currently being uploaded in FP16 format, and there are plans to convert the model to GGML and GPTQ 4bit quantizations. The team is also working on a full benchmark, similar to what was done for GPT4-x-Vicuna. We will try to get in discussions to get the model included in the GPT4All.
+## Benchmark Results
+```
+HumanEval: 39%
+|                      Task                      |Version|       Metric        |Value |   |Stderr|
+|------------------------------------------------|------:|---------------------|-----:|---|-----:|
+|arc_challenge                                   |      0|acc                  |0.2858|±  |0.0132|
+|                                                |       |acc_norm             |0.3148|±  |0.0136|
+|arc_easy                                        |      0|acc                  |0.5349|±  |0.0102|
+|                                                |       |acc_norm             |0.5097|±  |0.0103|
+|bigbench_causal_judgement                       |      0|multiple_choice_grade|0.5158|±  |0.0364|
+|bigbench_date_understanding                     |      0|multiple_choice_grade|0.5230|±  |0.0260|
+|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|0.3295|±  |0.0293|
+|bigbench_geometric_shapes                       |      0|multiple_choice_grade|0.1003|±  |0.0159|
+|                                                |       |exact_str_match      |0.0000|±  |0.0000|
+|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|0.2260|±  |0.0187|
+|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|0.1957|±  |0.0150|
+|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|0.3733|±  |0.0280|
+|bigbench_movie_recommendation                   |      0|multiple_choice_grade|0.3200|±  |0.0209|
+|bigbench_navigate                               |      0|multiple_choice_grade|0.4830|±  |0.0158|
+|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|0.4150|±  |0.0110|
+|bigbench_ruin_names                             |      0|multiple_choice_grade|0.2143|±  |0.0194|
+|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|0.2926|±  |0.0144|
+|bigbench_snarks                                 |      0|multiple_choice_grade|0.5249|±  |0.0372|
+|bigbench_sports_understanding                   |      0|multiple_choice_grade|0.4817|±  |0.0159|
+|bigbench_temporal_sequences                     |      0|multiple_choice_grade|0.2700|±  |0.0140|
+|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.1864|±  |0.0110|
+|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1349|±  |0.0082|
+|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.3733|±  |0.0280|
+|boolq                                           |      1|acc                  |0.5498|±  |0.0087|
+|hellaswag                                       |      0|acc                  |0.3814|±  |0.0048|
+|                                                |       |acc_norm             |0.4677|±  |0.0050|
+|openbookqa                                      |      0|acc                  |0.1960|±  |0.0178|
+|                                                |       |acc_norm             |0.3100|±  |0.0207|
+|piqa                                            |      0|acc                  |0.6600|±  |0.0111|
+|                                                |       |acc_norm             |0.6610|±  |0.0110|
+|winogrande                                      |      0|acc                  |0.5343|±  |0.0140|
+```
+## Model Usage
+The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
+Compute provided by our project sponsor Redmond AI, thank you!!