moranyanuka
/

icc

Text Classification

Model card Files Files and versions Community

icc / README.md

moranyanuka's picture

Update README.md

6d3a879 verified 4 months ago

|

history blame contribute delete

1.59 kB

	---
	license: mit
	pipeline_tag: text-classification
	inference: false
	---

	# Official ICC model [ACL 2024 Findings]

	The official checkpoint of ICC model, introduced in [ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation](https://arxiv.org/abs/2403.01306)

	[Project Page](https://moranyanuka.github.io/icc/)

	## Usage

	The ICC model is used to quantify the concreteness of image captions, and the intended use is finding the best captions in a noisy multimodal dataset. It can be achieved by simply running it over the captions and filtering out samples with low score.
	It works best in conjunction with CLIP based filtering.


	### Running the model

	<details>
	<summary> Click to expand </summary>

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	tokenizer = AutoTokenizer.from_pretrained("moranyanuka/icc")
	model = AutoModelForSequenceClassification.from_pretrained("moranyanuka/icc").to("cuda")

	captions = ["a great method of quantifying concreteness", "a man with a white shirt"]
	text_ids = tokenizer(captions, padding=True, return_tensors="pt", truncation=True).to('cuda')
	with torch.inference_mode():
	icc_scores = model(**text_ids)['logits']

	# tensor([[0.0339], [1.0068]])
	```
	</details>



	bibtex:
	```
	@misc{yanuka2024icc,
	title={ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation},
	author={Moran Yanuka and Morris Alper and Hadar Averbuch-Elor and Raja Giryes},
	year={2024},
	eprint={2403.01306},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```