yydxlv
/

colqwen2-7b-v1.0

visual-document-retrieval

Inference Endpoints

Model card Files Files and versions Community

colqwen2-7b-v1.0 / README.md

yydxlv's picture

Update metadata with huggingface_hub (#1)

e353535 verified about 21 hours ago

|

history blame contribute delete

753 Bytes

	---
	license: mit
	datasets:
	- vidore/colpali_train_set
	base_model:
	- Qwen/Qwen2-VL-7B-Instruct
	pipeline_tag: visual-document-retrieval
	library_name: transformers
	tags:
	- vidore
	---

	## Model Details

	### Model Description

	ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
	It is a Qwen2-VL-7B extension that generates ColBERT- style multi-vector representations of text and images.
	It was introduced in the paper ColPali: Efficient Document Retrieval with Vision Language Models and first released in this repository.

	This version is trained on 8xA800 with batch_size 32*8 for 3 epochs.



	- Developed by: IEIT systems