michael-guenther
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -3187,7 +3187,24 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
|
|
3187 |
</p>
|
3188 |
</details>
|
3189 |
|
3190 |
-
You can use Jina Embedding models directly from transformers package
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3191 |
```python
|
3192 |
!pip install transformers
|
3193 |
from transformers import AutoModel
|
@@ -3208,6 +3225,28 @@ embeddings = model.encode(
|
|
3208 |
)
|
3209 |
```
|
3210 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3211 |
## Alternatives to Using Transformers Package
|
3212 |
|
3213 |
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
@@ -3227,6 +3266,17 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
|
|
3227 |
|
3228 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
3229 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3230 |
## Contact
|
3231 |
|
3232 |
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
|
|
|
3187 |
</p>
|
3188 |
</details>
|
3189 |
|
3190 |
+
You can use Jina Embedding models directly from transformers package.
|
3191 |
+
|
3192 |
+
First, you need to make sure that you are logged into huggingface. You can either use the huggingface-cli tool (after installing the `transformers` package) and pass your [hugginface access token](https://huggingface.co/docs/hub/security-tokens):
|
3193 |
+
```bash
|
3194 |
+
huggingface-cli login
|
3195 |
+
```
|
3196 |
+
Alternatively, you can provide the access token as an environment variable in the shell:
|
3197 |
+
```bash
|
3198 |
+
export HF_TOKEN="<your token here>"
|
3199 |
+
```
|
3200 |
+
or in Python:
|
3201 |
+
```python
|
3202 |
+
import os
|
3203 |
+
|
3204 |
+
os.environ['HF_TOKEN'] = "<your token here>"
|
3205 |
+
```
|
3206 |
+
|
3207 |
+
Then, you can use load and use the model via the `AutoModel` class:
|
3208 |
```python
|
3209 |
!pip install transformers
|
3210 |
from transformers import AutoModel
|
|
|
3225 |
)
|
3226 |
```
|
3227 |
|
3228 |
+
Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):
|
3229 |
+
|
3230 |
+
```python
|
3231 |
+
!pip install -U sentence-transformers
|
3232 |
+
from sentence_transformers import SentenceTransformer
|
3233 |
+
from sentence_transformers.util import cos_sim
|
3234 |
+
|
3235 |
+
model = SentenceTransformer(
|
3236 |
+
"jinaai/jina-embeddings-v2-base-de", # switch to en/zh for English or Chinese
|
3237 |
+
trust_remote_code=True
|
3238 |
+
)
|
3239 |
+
|
3240 |
+
# control your input sequence length up to 8192
|
3241 |
+
model.max_seq_length = 1024
|
3242 |
+
|
3243 |
+
embeddings = model.encode([
|
3244 |
+
'How is the weather today?',
|
3245 |
+
'Wie ist das Wetter heute?'
|
3246 |
+
])
|
3247 |
+
print(cos_sim(embeddings[0], embeddings[1]))
|
3248 |
+
```
|
3249 |
+
|
3250 |
## Alternatives to Using Transformers Package
|
3251 |
|
3252 |
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
|
|
3266 |
|
3267 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
3268 |
|
3269 |
+
## Trouble Shooting
|
3270 |
+
|
3271 |
+
**Loading of Model Code failed**
|
3272 |
+
|
3273 |
+
If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized.
|
3274 |
+
This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:
|
3275 |
+
|
3276 |
+
```bash
|
3277 |
+
Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
|
3278 |
+
```
|
3279 |
+
|
3280 |
## Contact
|
3281 |
|
3282 |
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
|