alejandrovil
/

llama3-AWQ

Model card Files Files and versions Community

alejandrovil commited on May 23, 2024

Commit

806948a

verified ·

1 Parent(s): c1357c7

Update README.md

Browse files

Files changed (1) hide show

README.md +99 -120

README.md CHANGED Viewed

@@ -1,120 +1,99 @@
----
-library_name: transformers
-tags:
-- 4-bit
-- AWQ
-- text-generation
-- autotrain_compatible
-- endpoints_compatible
-- Llama-3
-- instruct
-- finetune
-- chatml
-- DPO
-- RLHF
-- gpt4
-- synthetic data
-- distillation
-- function calling
-- json mode
-- axolotl
-model-index:
-- name: Hermes-2-Pro-Llama-3-8B
-  results: []
-license: apache-2.0
-language:
-- en
-datasets:
-- teknium/OpenHermes-2.5
-widget:
-- example_title: Hermes 2 Pro
-  messages:
-  - role: system
-    content: You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.
-  - role: user
-    content: Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.
-pipeline_tag: text-generation
-inference: false
-quantized_by: Suparious
----
-# NousResearch/Hermes-2-Pro-Llama-3-8B AWQ
-- Model creator: [NousResearch](https://huggingface.co/NousResearch)
-- Original model: [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ggO2sBDJ8Bhc6w-zwTx5j.png)
-## Model Description
-Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.
-This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs, and has improved on several other metrics as well, scoring a 90% on our function calling evaluation built in partnership with Fireworks.AI, and an 84% on our structured JSON Output evaluation.
-Hermes Pro takes advantage of a special system prompt and multi-turn function calling structure with a new chatml role in order to make function calling reliable and easy to parse. Learn more about prompting below.
-This version of Hermes 2 Pro adds several tokens to assist with agentic capabilities in parsing while streaming tokens - `<tools>`, `<tool_call>`, `<tool_response>` and their closing tags are single tokens now.
-This work was a collaboration between Nous Research, @interstellarninja, and Fireworks.AI
-Learn more about the function calling system for this model on our github repo here: https://github.com/NousResearch/Hermes-Function-Calling
-## How to use
-### Install the necessary packages
-```bash
-pip install --upgrade autoawq autoawq-kernels
-```
-### Example Python code
-```python
-from awq import AutoAWQForCausalLM
-from transformers import AutoTokenizer, TextStreamer
-model_path = "solidrust/Hermes-2-Pro-Llama-3-8B-AWQ"
-system_message = "You are Hermes-2-Pro-Llama-3-8B, incarnated as a powerful AI. You were created by NousResearch."
-# Load model
-model = AutoAWQForCausalLM.from_quantized(model_path,
-                                          fuse_layers=True)
-tokenizer = AutoTokenizer.from_pretrained(model_path,
-                                          trust_remote_code=True)
-streamer = TextStreamer(tokenizer,
-                        skip_prompt=True,
-                        skip_special_tokens=True)
-# Convert prompt to tokens
-prompt_template = """\
-<|im_start|>system
-{system_message}<|im_end|>
-<|im_start|>user
-{prompt}<|im_end|>
-<|im_start|>assistant"""
-prompt = "You're standing on the surface of the Earth. "\
-        "You walk one mile south, one mile west and one mile north. "\
-        "You end up exactly where you started. Where are you?"
-tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
-                  return_tensors='pt').input_ids.cuda()
-# Generate output
-generation_output = model.generate(tokens,
-                                  streamer=streamer,
-                                  max_new_tokens=512)
-```
-### About AWQ
-AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
-AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
-It is supported by:
-- [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
-- [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
-- [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
-- [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
-- [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code

+---
+library_name: transformers
+tags:
+- 4-bit
+- AWQ
+- text-generation
+- autotrain_compatible
+- endpoints_compatible
+- Llama-3
+- instruct
+- finetune
+- chatml
+- DPO
+- RLHF
+- gpt4
+- synthetic data
+- distillation
+- function calling
+- json mode
+- axolotl
+model-index:
+- name: Hermes-2-Pro-Llama-3-8B
+  results: []
+license: apache-2.0
+language:
+- en
+datasets:
+- teknium/OpenHermes-2.5
+widget:
+- example_title: Hermes 2 Pro
+  messages:
+  - role: system
+    content: You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.
+  - role: user
+    content: Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.
+pipeline_tag: text-generation
+inference: false
+quantized_by: Suparious
+---
+# NousResearch/Hermes-2-Pro-Llama-3-8B AWQ
+- Original model: [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
+```bash
+pip install --upgrade autoawq autoawq-kernels
+```
+### Example Python code
+```python
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer, TextStreamer
+model_path = "solidrust/Hermes-2-Pro-Llama-3-8B-AWQ"
+system_message = "You are Hermes-2-Pro-Llama-3-8B, incarnated as a powerful AI. You were created by NousResearch."
+# Load model
+model = AutoAWQForCausalLM.from_quantized(model_path,
+                                          fuse_layers=True)
+tokenizer = AutoTokenizer.from_pretrained(model_path,
+                                          trust_remote_code=True)
+streamer = TextStreamer(tokenizer,
+                        skip_prompt=True,
+                        skip_special_tokens=True)
+# Convert prompt to tokens
+prompt_template = """\
+<|im_start|>system
+{system_message}<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant"""
+prompt = "You're standing on the surface of the Earth. "\
+        "You walk one mile south, one mile west and one mile north. "\
+        "You end up exactly where you started. Where are you?"
+tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
+                  return_tensors='pt').input_ids.cuda()
+# Generate output
+generation_output = model.generate(tokens,
+                                  streamer=streamer,
+                                  max_new_tokens=512)
+```
+### About AWQ
+AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
+AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
+It is supported by:
+- [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
+- [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
+- [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
+- [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
+- [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code