hanxiao commited on
Commit
8ee2c90
·
verified ·
1 Parent(s): abf8f24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -18,7 +18,7 @@ Jina CLIP: your CLIP model is also your text retriever!
18
 
19
  `jina-clip-v1` is a state-of-the-art English **multimodal (text-image) embedding model**.
20
 
21
- Traditional text embedding models, such as [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en), excel in text-to-text retrieval but fall short in cross-modal tasks. In contrast, models like [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) effectively align image and text embeddings but are not optimized for text-to-text retrieval due to their training methodologies and context limitations.
22
 
23
  `jina-clip-v1` bridges this gap by offering robust performance in both domains. Its text component matches the retrieval efficiency of `jina-embeddings-v2-base-en`, while its overall architecture sets a new benchmark for cross-modal retrieval. This dual capability makes it an excellent tool for multimodal retrieval-augmented generation (M-RAG) applications, enabling seamless text-to-text and text-to-image searches within a single model.
24
 
 
18
 
19
  `jina-clip-v1` is a state-of-the-art English **multimodal (text-image) embedding model**.
20
 
21
+ Traditional text embedding models, such as [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en), excel in text-to-text retrieval but incapable of cross-modal tasks. Models like [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) effectively align image and text embeddings but are not optimized for text-to-text retrieval due to their training methodologies and context limitations.
22
 
23
  `jina-clip-v1` bridges this gap by offering robust performance in both domains. Its text component matches the retrieval efficiency of `jina-embeddings-v2-base-en`, while its overall architecture sets a new benchmark for cross-modal retrieval. This dual capability makes it an excellent tool for multimodal retrieval-augmented generation (M-RAG) applications, enabling seamless text-to-text and text-to-image searches within a single model.
24