metadata
license: apache-2.0
inference: false
tags:
- green
- llmware-rag
- p1
- ov
bling-tiny-llama-ov
bling-tiny-llama-ov is an OpenVino int4 quantized version of BLING Tiny-Llama 1B, providing a very fast, very small inference implementation, optimized for AI PCs using Intel GPU, CPU and NPU.
bling-tiny-llama is a fact-based question-answering model, optimized for complex business documents.
Get started right away
- Install dependencies
pip3 install llmware
pip3 install openvino
pip3 install openvino_genai
- Hello World
from llmware.models import ModelCatalog
model = ModelCatalog().load_model("bling-tiny-llama-ov")
response = model.inference("The stock price is $45.\nWhat is the stock price?")
print("response: ", response)
Get started right away with OpenVino
Looking for AI PC solutions and demos, contact us at llmware
Model Description
- Developed by: llmware
- Model type: tinyllama
- Parameters: 1.1 billion
- Model Parent: llmware/bling-tiny-llama-v0
- Language(s) (NLP): English
- License: Apache 2.0
- Uses: Fact-based question-answering
- RAG Benchmark Accuracy Score: 86.5
- Quantization: int4