llmware
/

bling-tiny-llama-onnx

Text Generation

Model card Files Files and versions Community

doberst commited on Sep 30, 2024

Commit

449ac8b

·

verified ·

1 Parent(s): bce4718

Update README.md

Files changed (1) hide show

README.md +7 -10

README.md CHANGED Viewed

@@ -1,14 +1,14 @@
 ---
 license: apache-2.0
 inference: false
-tags: [green, llmware-rag, p1, ov]
 ---
-# bling-tiny-llama-ov
 <!-- Provide a quick summary of what the model is/does. -->
-**bling-tiny-llama-ov** is an OpenVino int4 quantized version of BLING Tiny-Llama 1B, providing a very fast, very small inference implementation, optimized for AI PCs using Intel GPU, CPU and NPU.
 [**bling-tiny-llama**](https://huggingface.co/llmware/bling-tiny-llama-v0) is a fact-based question-answering model, optimized for complex business documents.
@@ -16,23 +16,20 @@ Get started right away
 1.  Install dependencies
-  ```
   pip3 install llmware
-  pip3 install openvino
-  pip3 install openvino_genai
   ```
 2.  Hello World
-  ```
   from llmware.models import ModelCatalog
-  model = ModelCatalog().load_model("bling-tiny-llama-ov")
   response = model.inference("The stock price is $45.\nWhat is the stock price?")
   print("response: ", response)
   ```
-Get started right away with [OpenVino](https://github.com/openvinotoolkit/openvino)
 Looking for AI PC solutions and demos, contact us at [llmware](https://www.llmware.ai)

 ---
 license: apache-2.0
 inference: false
+tags: [green, llmware-rag, p1, onnx]
 ---
+# bling-tiny-llama-onnx
 <!-- Provide a quick summary of what the model is/does. -->
+**bling-tiny-llama-onnx** is an ONNX int4 quantized version of BLING Tiny-Llama 1B, providing a very fast, very small inference implementation, optimized for AI PCs using Intel GPU, CPU and NPU.
 [**bling-tiny-llama**](https://huggingface.co/llmware/bling-tiny-llama-v0) is a fact-based question-answering model, optimized for complex business documents.
 1.  Install dependencies
+  ```python
   pip3 install llmware
+  pip3 install onnxruntime_genai
   ```
 2.  Hello World
+  ```python
   from llmware.models import ModelCatalog
+  model = ModelCatalog().load_model("bling-tiny-llama-onnx")
   response = model.inference("The stock price is $45.\nWhat is the stock price?")
   print("response: ", response)
   ```
 Looking for AI PC solutions and demos, contact us at [llmware](https://www.llmware.ai)