|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# SoM-LLaVA Model Card |
|
LLaVA-v1.5 mixed trained with SoM style data (QA+listing). |
|
|
|
The model can understand tag-style visual prompts on the image (e.g., what is the object tagged with id 9?), also gained improved performance on MLLM benchmarks (POPE, MME, SEED, MM-Vet, LLav-wild), even when the input testing images has no tags. |
|
|
|
**For more information about SoM-LLaVA, check our [github page](https://github.com/zzxslp/SoM-LLaVA) and [paper](https://arxiv.org/abs/2404.16375)!** |
|
|
|
## Getting Started |
|
If you would like to load our model in huggingface, here is an example script: |
|
|
|
```python |
|
from PIL import Image |
|
import requests |
|
from transformers import AutoProcessor, LlavaForConditionalGeneration |
|
|
|
model_path = "zzxslp/som-llava-v1.5-13b-hf" |
|
|
|
model = LlavaForConditionalGeneration.from_pretrained(model_path) |
|
processor = AutoProcessor.from_pretrained(model_path) |
|
|
|
prompt = "USER: <image>\nWhat's the content of the image? ASSISTANT:" |
|
url = "https://www.ilankelman.org/stopsigns/australia.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
inputs = processor(text=prompt, images=image, return_tensors="pt") |
|
|
|
# Generate |
|
generate_ids = model.generate(**inputs, max_new_tokens=20) |
|
output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] |
|
print (output) |
|
``` |
|
|
|
Our original model weights: [[SoM-LLaVA-v1.5-13B](https://huggingface.co/zzxslp/som-llava-v1.5-13b)], to be used in [official LLaVA repo](https://github.com/haotian-liu/LLaVA) |
|
|
|
|
|
## Citation |
|
If you find our data or model useful for your research and applications, please cite our paper: |
|
|
|
``` |
|
@article{yan2024list, |
|
title={List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs}, |
|
author={Yan, An and Yang, Zhengyuan and Wu, Junda and Zhu, Wanrong and Yang, Jianwei and Li, Linjie and Lin, Kevin and Wang, Jianfeng and McAuley, Julian and Gao, Jianfeng and others}, |
|
journal={arXiv preprint arXiv:2404.16375}, |
|
year={2024} |
|
} |
|
``` |
|
|