license: mit | |
datasets: | |
- vidore/colpali_train_set | |
base_model: | |
- Qwen/Qwen2-VL-7B-Instruct | |
pipeline_tag: visual-document-retrieval | |
library_name: transformers | |
tags: | |
- vidore | |
## Model Details | |
### Model Description | |
ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features. | |
It is a Qwen2-VL-7B extension that generates ColBERT- style multi-vector representations of text and images. | |
It was introduced in the paper ColPali: Efficient Document Retrieval with Vision Language Models and first released in this repository. | |
This version is trained on 8xA800 with batch_size 32*8 for 3 epochs. | |
- **Developed by:** IEIT systems | |