Zuyan
/

llava-CoS-13B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

llava-CoS-13B / README.md

Zuyan's picture

Create README.md

73a6d10 verified 11 months ago

|

history blame contribute delete

672 Bytes

	<p align="center" width="100%">
	<img src="https://ice.frostsky.com/2024/03/17/214a3af4a34a26be0a04e551e16b9364.webp" width="40%" height="80%">
	</p>

	# Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

	### Model details:

	Chain-of-Spot encourages Large Vision-Language Models to identify the region of interest (ROI) in the image condition on the question and reasoning through an interactive manner, thereby improving the ability of visual understanding.

	### Where to send questions or comments about the model: https://github.com/dongyh20/Chain-of-Spot

	### Paper or resources for more information: https://sites.google.com/view/chain-of-spot/