|
--- |
|
license: mit |
|
pipeline_tag: text-classification |
|
inference: false |
|
--- |
|
|
|
# Official ICC model [ACL 2024 Findings] |
|
|
|
The official checkpoint of ICC model, introduced in [ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation](https://arxiv.org/abs/2403.01306) |
|
|
|
[Project Page](https://moranyanuka.github.io/icc/) |
|
|
|
## Usage |
|
|
|
The ICC model is used to quantify the concreteness of image captions, and the intended use is finding the best captions in a noisy multimodal dataset. It can be achieved by simply running it over the captions and filtering out samples with low score. |
|
It works best in conjunction with CLIP based filtering. |
|
|
|
|
|
### Running the model |
|
|
|
<details> |
|
<summary> Click to expand </summary> |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("moranyanuka/icc") |
|
model = AutoModelForSequenceClassification.from_pretrained("moranyanuka/icc").to("cuda") |
|
|
|
captions = ["a great method of quantifying concreteness", "a man with a white shirt"] |
|
text_ids = tokenizer(captions, padding=True, return_tensors="pt", truncation=True).to('cuda') |
|
with torch.inference_mode(): |
|
icc_scores = model(**text_ids)['logits'] |
|
|
|
# tensor([[0.0339], [1.0068]]) |
|
``` |
|
</details> |
|
|
|
|
|
|
|
bibtex: |
|
``` |
|
@misc{yanuka2024icc, |
|
title={ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation}, |
|
author={Moran Yanuka and Morris Alper and Hadar Averbuch-Elor and Raja Giryes}, |
|
year={2024}, |
|
eprint={2403.01306}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |