RaushanTurganbay HF staff commited on
Commit
dae4b39
·
verified ·
1 Parent(s): 5835f74

Update README.md

Browse files

Add transformers code snippet

Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -46,10 +46,48 @@ English
46
  The model is intended to be used in enterprise applications that involve processing visual and text data. In particular, the model is well-suited for a range of visual document understanding tasks, such as analyzing tables and charts, performing optical character recognition (OCR), and answering questions based on document content. Additionally, its capabilities extend to general image understanding, enabling it to be applied to a broader range of business applications. For tasks that exclusively involve text-based input, we suggest using our Granite large language models, which are optimized for text-only processing and offer superior performance compared to this model.
47
 
48
 
49
- **Generation:**
50
- This is a simple example of how to use the granite-vision-3.1-2b-preview model.
51
 
52
- Install the following libraries:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  ```shell
55
  pip install torch torchvision torchaudio
 
46
  The model is intended to be used in enterprise applications that involve processing visual and text data. In particular, the model is well-suited for a range of visual document understanding tasks, such as analyzing tables and charts, performing optical character recognition (OCR), and answering questions based on document content. Additionally, its capabilities extend to general image understanding, enabling it to be applied to a broader range of business applications. For tasks that exclusively involve text-based input, we suggest using our Granite large language models, which are optimized for text-only processing and offer superior performance compared to this model.
47
 
48
 
49
+ ## Generation:
 
50
 
51
+ Granite Vision model is supported natively `transformers>=4.48`. Below is a simple example of how to use the `granite-vision-3.1-2b-preview` model.
52
+
53
+ ### Usage with `transformers`
54
+
55
+ ```python
56
+ from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
57
+
58
+ model_path = "ibm-granite/granite-vision-3.1-2b-preview"
59
+ processor = LlavaNextProcessor.from_pretrained(model_path)
60
+ model = LlavaNextForConditionalGeneration.from_pretrained(model_path, device_map="cuda:0")
61
+
62
+ # prepare image and text prompt, using the appropriate prompt template
63
+ url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
64
+
65
+ conversation = [
66
+ {
67
+ "role": "user",
68
+ "content": [
69
+ {"type": "image", "url": url},
70
+ {"type": "text", "text": "What is shown in this image?"},
71
+ ],
72
+ },
73
+ ]
74
+ inputs = processor.apply_chat_template(
75
+ conversation,
76
+ add_generation_prompt=True,
77
+ tokenize=True,
78
+ return_dict=True,
79
+ return_tensors="pt"
80
+ ).to("cuda:0")
81
+
82
+
83
+ # autoregressively complete prompt
84
+ output = model.generate(**inputs, max_new_tokens=100)
85
+ print(processor.decode(output[0], skip_special_tokens=True))
86
+ ```
87
+
88
+ ### Usage with vLLM
89
+
90
+ The model can also be loaded with `vLLM`. First make sure to install the following libraries:
91
 
92
  ```shell
93
  pip install torch torchvision torchaudio