dangvantuan commited on
Commit
4f99da7
·
verified ·
1 Parent(s): 7eff199

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1533 -15
README.md CHANGED
@@ -6,19 +6,1528 @@ tags:
6
  - feature-extraction
7
  - sentence-similarity
8
  - transformers
9
- - phobert
10
  - french
 
11
  - sentence-embedding
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  license: apache-2.0
13
  language:
14
  - fr
15
  - en
16
- metrics:
17
- - pearsonr
18
- - spearmanr
19
  ---
20
  ## Model Description:
21
- [**french-embedding-LongContext**](https://huggingface.co/dangvantuan/french-embedding-LongContext) is the Embedding Model for French-English language with context length up to 8096 tokens. This model is a specialized text-embedding trained specifically for the french language, which is built upon [gte-multilingual](Alibaba-NLP/gte-multilingual-base) and trained using the Multi-Negative Ranking Loss, Matryoshka2dLoss and SimilarityLoss.
22
 
23
  ## Full Model Architecture
24
  ```
@@ -28,14 +1537,6 @@ SentenceTransformer(
28
  (2): Normalize()
29
  )
30
  ```
31
- ## Training and Fine-tuning process
32
- The model underwent a rigorous four-stage training and fine-tuning process, each tailored to enhance its ability to generate precise and contextually relevant sentence embeddings for the french language. Below is an outline of these stages:
33
- #### Stage 1: Training NLI on dataset XNLI:
34
- - Dataset: XNLI (fr-en)
35
- - Method: Training using Multi-Negative Ranking Loss and Matryoshka2dLoss. This stage focused on improving the model's ability to discern and rank nuanced differences in sentence semantics.
36
- ### Stage 2: Fine-tuning for Semantic Textual Similarity on STS Benchmark
37
- - Dataset: STS-B (fr-en)
38
- - Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library. This stage honed the model's precision in capturing semantic similarity across various types of french texts.
39
 
40
 
41
  ## Usage:
@@ -54,7 +1555,7 @@ sentences = ["Paris est une capitale de la France", "Paris is a capital of Franc
54
 
55
 
56
 
57
- model = SentenceTransformer('dangvantuan/french-embedding-LongContext', trust_remote_code=True)
58
  embeddings = model.encode(sentences)
59
  print(embeddings)
60
 
@@ -78,7 +1579,6 @@ print(embeddings)
78
  year={2019}
79
  }
80
 
81
-
82
  @article{zhang2024mgte,
83
  title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval},
84
  author={Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and others},
@@ -98,4 +1598,22 @@ print(embeddings)
98
  author={Li, Xianming and Li, Zongxi and Li, Jing and Xie, Haoran and Li, Qing},
99
  journal={arXiv preprint arXiv:2402.14776},
100
  year={2024}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  }
 
6
  - feature-extraction
7
  - sentence-similarity
8
  - transformers
 
9
  - french
10
+ - english
11
  - sentence-embedding
12
+ - mteb
13
+ model-index:
14
+ - name: 7eff199d41ff669fad99d83cad9249c393c3f14b
15
+ results:
16
+ - task:
17
+ type: Clustering
18
+ dataset:
19
+ type: lyon-nlp/alloprof
20
+ name: MTEB AlloProfClusteringP2P
21
+ config: default
22
+ split: test
23
+ revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
24
+ metrics:
25
+ - type: v_measure
26
+ value: 59.69196295449414
27
+ - type: v_measures
28
+ value: [0.6355772777559684, 0.4980707615440343, 0.5851538838323186, 0.6567709175938427, 0.5712405288636999]
29
+ - task:
30
+ type: Clustering
31
+ dataset:
32
+ type: lyon-nlp/alloprof
33
+ name: MTEB AlloProfClusteringS2S
34
+ config: default
35
+ split: test
36
+ revision: 392ba3f5bcc8c51f578786c1fc3dae648662cb9b
37
+ metrics:
38
+ - type: v_measure
39
+ value: 45.607106996926426
40
+ - type: v_measures
41
+ value: [0.45846869913649535, 0.42657120373128293, 0.45507356125930876, 0.4258913306353704, 0.4779122207000794]
42
+ - task:
43
+ type: Reranking
44
+ dataset:
45
+ type: lyon-nlp/mteb-fr-reranking-alloprof-s2p
46
+ name: MTEB AlloprofReranking
47
+ config: default
48
+ split: test
49
+ revision: 65393d0d7a08a10b4e348135e824f385d420b0fd
50
+ metrics:
51
+ - type: map
52
+ value: 73.51836428087765
53
+ - type: mrr
54
+ value: 74.8550285111166
55
+ - type: nAUC_map_diff1
56
+ value: 56.006169898728466
57
+ - type: nAUC_map_max
58
+ value: 27.886037223407506
59
+ - type: nAUC_mrr_diff1
60
+ value: 56.68072778248672
61
+ - type: nAUC_mrr_max
62
+ value: 29.362681962243276
63
+ - task:
64
+ type: Retrieval
65
+ dataset:
66
+ type: lyon-nlp/alloprof
67
+ name: MTEB AlloprofRetrieval
68
+ config: default
69
+ split: test
70
+ revision: fcf295ea64c750f41fadbaa37b9b861558e1bfbd
71
+ metrics:
72
+ - type: map_at_1
73
+ value: 32.080999999999996
74
+ - type: map_at_10
75
+ value: 43.582
76
+ - type: map_at_100
77
+ value: 44.381
78
+ - type: map_at_1000
79
+ value: 44.426
80
+ - type: map_at_20
81
+ value: 44.061
82
+ - type: map_at_3
83
+ value: 40.602
84
+ - type: map_at_5
85
+ value: 42.381
86
+ - type: mrr_at_1
87
+ value: 32.08117443868739
88
+ - type: mrr_at_10
89
+ value: 43.5823429832498
90
+ - type: mrr_at_100
91
+ value: 44.38068560877513
92
+ - type: mrr_at_1000
93
+ value: 44.426194305504026
94
+ - type: mrr_at_20
95
+ value: 44.06128094655753
96
+ - type: mrr_at_3
97
+ value: 40.60161197466903
98
+ - type: mrr_at_5
99
+ value: 42.380541162924715
100
+ - type: nauc_map_at_1000_diff1
101
+ value: 37.22997629352391
102
+ - type: nauc_map_at_1000_max
103
+ value: 38.65090969900466
104
+ - type: nauc_map_at_100_diff1
105
+ value: 37.22644507166512
106
+ - type: nauc_map_at_100_max
107
+ value: 38.67447923917633
108
+ - type: nauc_map_at_10_diff1
109
+ value: 37.02440573022942
110
+ - type: nauc_map_at_10_max
111
+ value: 38.52972171430789
112
+ - type: nauc_map_at_1_diff1
113
+ value: 41.18101653444774
114
+ - type: nauc_map_at_1_max
115
+ value: 34.87383192583458
116
+ - type: nauc_map_at_20_diff1
117
+ value: 37.14172285932024
118
+ - type: nauc_map_at_20_max
119
+ value: 38.66753159239803
120
+ - type: nauc_map_at_3_diff1
121
+ value: 37.53556306862998
122
+ - type: nauc_map_at_3_max
123
+ value: 37.86008195327724
124
+ - type: nauc_map_at_5_diff1
125
+ value: 37.14904081229067
126
+ - type: nauc_map_at_5_max
127
+ value: 38.267819714061105
128
+ - type: nauc_mrr_at_1000_diff1
129
+ value: 37.22997629352391
130
+ - type: nauc_mrr_at_1000_max
131
+ value: 38.65090969900466
132
+ - type: nauc_mrr_at_100_diff1
133
+ value: 37.22644507166512
134
+ - type: nauc_mrr_at_100_max
135
+ value: 38.67447923917633
136
+ - type: nauc_mrr_at_10_diff1
137
+ value: 37.02440573022942
138
+ - type: nauc_mrr_at_10_max
139
+ value: 38.52972171430789
140
+ - type: nauc_mrr_at_1_diff1
141
+ value: 41.18101653444774
142
+ - type: nauc_mrr_at_1_max
143
+ value: 34.87383192583458
144
+ - type: nauc_mrr_at_20_diff1
145
+ value: 37.14172285932024
146
+ - type: nauc_mrr_at_20_max
147
+ value: 38.66753159239803
148
+ - type: nauc_mrr_at_3_diff1
149
+ value: 37.53556306862998
150
+ - type: nauc_mrr_at_3_max
151
+ value: 37.86008195327724
152
+ - type: nauc_mrr_at_5_diff1
153
+ value: 37.14904081229067
154
+ - type: nauc_mrr_at_5_max
155
+ value: 38.267819714061105
156
+ - type: nauc_ndcg_at_1000_diff1
157
+ value: 36.313082263552204
158
+ - type: nauc_ndcg_at_1000_max
159
+ value: 40.244406213773765
160
+ - type: nauc_ndcg_at_100_diff1
161
+ value: 36.17060946689135
162
+ - type: nauc_ndcg_at_100_max
163
+ value: 41.069278488584416
164
+ - type: nauc_ndcg_at_10_diff1
165
+ value: 35.2775471480974
166
+ - type: nauc_ndcg_at_10_max
167
+ value: 40.33902753007036
168
+ - type: nauc_ndcg_at_1_diff1
169
+ value: 41.18101653444774
170
+ - type: nauc_ndcg_at_1_max
171
+ value: 34.87383192583458
172
+ - type: nauc_ndcg_at_20_diff1
173
+ value: 35.71067272175871
174
+ - type: nauc_ndcg_at_20_max
175
+ value: 40.94374381572908
176
+ - type: nauc_ndcg_at_3_diff1
177
+ value: 36.45082651868188
178
+ - type: nauc_ndcg_at_3_max
179
+ value: 38.87195110158222
180
+ - type: nauc_ndcg_at_5_diff1
181
+ value: 35.683568481780505
182
+ - type: nauc_ndcg_at_5_max
183
+ value: 39.606933866599
184
+ - type: nauc_precision_at_1000_diff1
185
+ value: 15.489726515767439
186
+ - type: nauc_precision_at_1000_max
187
+ value: 75.94259161180715
188
+ - type: nauc_precision_at_100_diff1
189
+ value: 30.033605095284656
190
+ - type: nauc_precision_at_100_max
191
+ value: 62.40786465750442
192
+ - type: nauc_precision_at_10_diff1
193
+ value: 28.617170969915
194
+ - type: nauc_precision_at_10_max
195
+ value: 47.35884745487521
196
+ - type: nauc_precision_at_1_diff1
197
+ value: 41.18101653444774
198
+ - type: nauc_precision_at_1_max
199
+ value: 34.87383192583458
200
+ - type: nauc_precision_at_20_diff1
201
+ value: 29.730952749557144
202
+ - type: nauc_precision_at_20_max
203
+ value: 52.09696741873719
204
+ - type: nauc_precision_at_3_diff1
205
+ value: 33.30844921569695
206
+ - type: nauc_precision_at_3_max
207
+ value: 41.84496633792437
208
+ - type: nauc_precision_at_5_diff1
209
+ value: 31.000246292430838
210
+ - type: nauc_precision_at_5_max
211
+ value: 43.88721507465343
212
+ - type: nauc_recall_at_1000_diff1
213
+ value: 15.48972651576705
214
+ - type: nauc_recall_at_1000_max
215
+ value: 75.94259161180725
216
+ - type: nauc_recall_at_100_diff1
217
+ value: 30.033605095284816
218
+ - type: nauc_recall_at_100_max
219
+ value: 62.40786465750426
220
+ - type: nauc_recall_at_10_diff1
221
+ value: 28.617170969914984
222
+ - type: nauc_recall_at_10_max
223
+ value: 47.35884745487525
224
+ - type: nauc_recall_at_1_diff1
225
+ value: 41.18101653444774
226
+ - type: nauc_recall_at_1_max
227
+ value: 34.87383192583458
228
+ - type: nauc_recall_at_20_diff1
229
+ value: 29.730952749557087
230
+ - type: nauc_recall_at_20_max
231
+ value: 52.09696741873715
232
+ - type: nauc_recall_at_3_diff1
233
+ value: 33.30844921569694
234
+ - type: nauc_recall_at_3_max
235
+ value: 41.84496633792433
236
+ - type: nauc_recall_at_5_diff1
237
+ value: 31.000246292430838
238
+ - type: nauc_recall_at_5_max
239
+ value: 43.88721507465339
240
+ - type: ndcg_at_1
241
+ value: 32.080999999999996
242
+ - type: ndcg_at_10
243
+ value: 49.502
244
+ - type: ndcg_at_100
245
+ value: 53.52
246
+ - type: ndcg_at_1000
247
+ value: 54.842
248
+ - type: ndcg_at_20
249
+ value: 51.219
250
+ - type: ndcg_at_3
251
+ value: 43.381
252
+ - type: ndcg_at_5
253
+ value: 46.603
254
+ - type: precision_at_1
255
+ value: 32.080999999999996
256
+ - type: precision_at_10
257
+ value: 6.822
258
+ - type: precision_at_100
259
+ value: 0.873
260
+ - type: precision_at_1000
261
+ value: 0.098
262
+ - type: precision_at_20
263
+ value: 3.7479999999999998
264
+ - type: precision_at_3
265
+ value: 17.142
266
+ - type: precision_at_5
267
+ value: 11.857
268
+ - type: recall_at_1
269
+ value: 32.080999999999996
270
+ - type: recall_at_10
271
+ value: 68.221
272
+ - type: recall_at_100
273
+ value: 87.349
274
+ - type: recall_at_1000
275
+ value: 98.014
276
+ - type: recall_at_20
277
+ value: 74.957
278
+ - type: recall_at_3
279
+ value: 51.425
280
+ - type: recall_at_5
281
+ value: 59.282999999999994
282
+ - task:
283
+ type: Classification
284
+ dataset:
285
+ type: mteb/amazon_reviews_multi
286
+ name: MTEB AmazonReviewsClassification (fr)
287
+ config: fr
288
+ split: test
289
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
290
+ metrics:
291
+ - type: accuracy
292
+ value: 39.892
293
+ - type: f1
294
+ value: 38.38126304364462
295
+ - type: f1_weighted
296
+ value: 38.38126304364462
297
+ - task:
298
+ type: Retrieval
299
+ dataset:
300
+ type: maastrichtlawtech/bsard
301
+ name: MTEB BSARDRetrieval
302
+ config: default
303
+ split: test
304
+ revision: 5effa1b9b5fa3b0f9e12523e6e43e5f86a6e6d59
305
+ metrics:
306
+ - type: map_at_1
307
+ value: 10.811
308
+ - type: map_at_10
309
+ value: 16.414
310
+ - type: map_at_100
311
+ value: 17.647
312
+ - type: map_at_1000
313
+ value: 17.742
314
+ - type: map_at_20
315
+ value: 17.22
316
+ - type: map_at_3
317
+ value: 14.188999999999998
318
+ - type: map_at_5
319
+ value: 15.113
320
+ - type: mrr_at_1
321
+ value: 10.81081081081081
322
+ - type: mrr_at_10
323
+ value: 16.41427141427142
324
+ - type: mrr_at_100
325
+ value: 17.647339314041712
326
+ - type: mrr_at_1000
327
+ value: 17.74213263983212
328
+ - type: mrr_at_20
329
+ value: 17.219989884463573
330
+ - type: mrr_at_3
331
+ value: 14.18918918918919
332
+ - type: mrr_at_5
333
+ value: 15.112612612612612
334
+ - type: nauc_map_at_1000_diff1
335
+ value: 13.07108195916555
336
+ - type: nauc_map_at_1000_max
337
+ value: 14.000521014179807
338
+ - type: nauc_map_at_100_diff1
339
+ value: 13.087117094079332
340
+ - type: nauc_map_at_100_max
341
+ value: 13.99712558752583
342
+ - type: nauc_map_at_10_diff1
343
+ value: 13.452029501381165
344
+ - type: nauc_map_at_10_max
345
+ value: 13.3341655571542
346
+ - type: nauc_map_at_1_diff1
347
+ value: 14.990419981155167
348
+ - type: nauc_map_at_1_max
349
+ value: 8.812519082504037
350
+ - type: nauc_map_at_20_diff1
351
+ value: 12.80321357992737
352
+ - type: nauc_map_at_20_max
353
+ value: 14.020962859032371
354
+ - type: nauc_map_at_3_diff1
355
+ value: 14.84230805712973
356
+ - type: nauc_map_at_3_max
357
+ value: 11.644032755353722
358
+ - type: nauc_map_at_5_diff1
359
+ value: 15.100168959732835
360
+ - type: nauc_map_at_5_max
361
+ value: 13.634801099074355
362
+ - type: nauc_mrr_at_1000_diff1
363
+ value: 13.07108195916555
364
+ - type: nauc_mrr_at_1000_max
365
+ value: 14.000521014179807
366
+ - type: nauc_mrr_at_100_diff1
367
+ value: 13.087117094079332
368
+ - type: nauc_mrr_at_100_max
369
+ value: 13.99712558752583
370
+ - type: nauc_mrr_at_10_diff1
371
+ value: 13.452029501381165
372
+ - type: nauc_mrr_at_10_max
373
+ value: 13.3341655571542
374
+ - type: nauc_mrr_at_1_diff1
375
+ value: 14.990419981155167
376
+ - type: nauc_mrr_at_1_max
377
+ value: 8.812519082504037
378
+ - type: nauc_mrr_at_20_diff1
379
+ value: 12.80321357992737
380
+ - type: nauc_mrr_at_20_max
381
+ value: 14.020962859032371
382
+ - type: nauc_mrr_at_3_diff1
383
+ value: 14.84230805712973
384
+ - type: nauc_mrr_at_3_max
385
+ value: 11.644032755353722
386
+ - type: nauc_mrr_at_5_diff1
387
+ value: 15.100168959732835
388
+ - type: nauc_mrr_at_5_max
389
+ value: 13.634801099074355
390
+ - type: nauc_ndcg_at_1000_diff1
391
+ value: 11.335350893370972
392
+ - type: nauc_ndcg_at_1000_max
393
+ value: 16.09665875369169
394
+ - type: nauc_ndcg_at_100_diff1
395
+ value: 11.499643600969176
396
+ - type: nauc_ndcg_at_100_max
397
+ value: 15.967105414704186
398
+ - type: nauc_ndcg_at_10_diff1
399
+ value: 12.093263549786606
400
+ - type: nauc_ndcg_at_10_max
401
+ value: 14.605821897766461
402
+ - type: nauc_ndcg_at_1_diff1
403
+ value: 14.990419981155167
404
+ - type: nauc_ndcg_at_1_max
405
+ value: 8.812519082504037
406
+ - type: nauc_ndcg_at_20_diff1
407
+ value: 10.197380043193812
408
+ - type: nauc_ndcg_at_20_max
409
+ value: 16.332533239525365
410
+ - type: nauc_ndcg_at_3_diff1
411
+ value: 14.835825175950765
412
+ - type: nauc_ndcg_at_3_max
413
+ value: 11.898757954417214
414
+ - type: nauc_ndcg_at_5_diff1
415
+ value: 15.278603386081823
416
+ - type: nauc_ndcg_at_5_max
417
+ value: 15.007133861218167
418
+ - type: nauc_precision_at_1000_diff1
419
+ value: 2.7469897420865195
420
+ - type: nauc_precision_at_1000_max
421
+ value: 26.874535278616346
422
+ - type: nauc_precision_at_100_diff1
423
+ value: 7.600735526139776
424
+ - type: nauc_precision_at_100_max
425
+ value: 20.7203382946415
426
+ - type: nauc_precision_at_10_diff1
427
+ value: 8.938642089366768
428
+ - type: nauc_precision_at_10_max
429
+ value: 17.320961743140874
430
+ - type: nauc_precision_at_1_diff1
431
+ value: 14.990419981155167
432
+ - type: nauc_precision_at_1_max
433
+ value: 8.812519082504037
434
+ - type: nauc_precision_at_20_diff1
435
+ value: 3.733877816322278
436
+ - type: nauc_precision_at_20_max
437
+ value: 21.581173305923002
438
+ - type: nauc_precision_at_3_diff1
439
+ value: 14.828850401790316
440
+ - type: nauc_precision_at_3_max
441
+ value: 12.369943286612463
442
+ - type: nauc_precision_at_5_diff1
443
+ value: 15.728617939150672
444
+ - type: nauc_precision_at_5_max
445
+ value: 18.103783411900697
446
+ - type: nauc_recall_at_1000_diff1
447
+ value: 2.746989742086615
448
+ - type: nauc_recall_at_1000_max
449
+ value: 26.874535278616367
450
+ - type: nauc_recall_at_100_diff1
451
+ value: 7.600735526139775
452
+ - type: nauc_recall_at_100_max
453
+ value: 20.720338294641536
454
+ - type: nauc_recall_at_10_diff1
455
+ value: 8.93864208936673
456
+ - type: nauc_recall_at_10_max
457
+ value: 17.32096174314083
458
+ - type: nauc_recall_at_1_diff1
459
+ value: 14.990419981155167
460
+ - type: nauc_recall_at_1_max
461
+ value: 8.812519082504037
462
+ - type: nauc_recall_at_20_diff1
463
+ value: 3.733877816322231
464
+ - type: nauc_recall_at_20_max
465
+ value: 21.58117330592295
466
+ - type: nauc_recall_at_3_diff1
467
+ value: 14.828850401790339
468
+ - type: nauc_recall_at_3_max
469
+ value: 12.369943286612509
470
+ - type: nauc_recall_at_5_diff1
471
+ value: 15.72861793915063
472
+ - type: nauc_recall_at_5_max
473
+ value: 18.103783411900658
474
+ - type: ndcg_at_1
475
+ value: 10.811
476
+ - type: ndcg_at_10
477
+ value: 20.244
478
+ - type: ndcg_at_100
479
+ value: 26.526
480
+ - type: ndcg_at_1000
481
+ value: 29.217
482
+ - type: ndcg_at_20
483
+ value: 23.122
484
+ - type: ndcg_at_3
485
+ value: 15.396
486
+ - type: ndcg_at_5
487
+ value: 17.063
488
+ - type: precision_at_1
489
+ value: 10.811
490
+ - type: precision_at_10
491
+ value: 3.288
492
+ - type: precision_at_100
493
+ value: 0.631
494
+ - type: precision_at_1000
495
+ value: 0.08499999999999999
496
+ - type: precision_at_20
497
+ value: 2.207
498
+ - type: precision_at_3
499
+ value: 6.306000000000001
500
+ - type: precision_at_5
501
+ value: 4.595
502
+ - type: recall_at_1
503
+ value: 10.811
504
+ - type: recall_at_10
505
+ value: 32.883
506
+ - type: recall_at_100
507
+ value: 63.063
508
+ - type: recall_at_1000
509
+ value: 84.685
510
+ - type: recall_at_20
511
+ value: 44.144
512
+ - type: recall_at_3
513
+ value: 18.919
514
+ - type: recall_at_5
515
+ value: 22.973
516
+ - task:
517
+ type: Clustering
518
+ dataset:
519
+ type: lyon-nlp/clustering-hal-s2s
520
+ name: MTEB HALClusteringS2S
521
+ config: default
522
+ split: test
523
+ revision: e06ebbbb123f8144bef1a5d18796f3dec9ae2915
524
+ metrics:
525
+ - type: v_measure
526
+ value: 25.209561281028435
527
+ - type: v_measures
528
+ value: [0.28558356565178666, 0.2707322246129254, 0.2683693125038299, 0.2703937853835602, 0.22057190525667872]
529
+ - task:
530
+ type: Clustering
531
+ dataset:
532
+ type: reciTAL/mlsum
533
+ name: MTEB MLSUMClusteringP2P
534
+ config: default
535
+ split: test
536
+ revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
537
+ metrics:
538
+ - type: v_measure
539
+ value: 42.82528809996964
540
+ - type: v_measures
541
+ value: [0.43465029372260205, 0.42821098223656917, 0.43537879149583325, 0.4289578694928627, 0.3794307754465835]
542
+ - task:
543
+ type: Clustering
544
+ dataset:
545
+ type: reciTAL/mlsum
546
+ name: MTEB MLSUMClusteringS2S
547
+ config: default
548
+ split: test
549
+ revision: b5d54f8f3b61ae17845046286940f03c6bc79bc7
550
+ metrics:
551
+ - type: v_measure
552
+ value: 43.44172295073941
553
+ - type: v_measures
554
+ value: [0.4294163918345751, 0.46229994906725164, 0.44188446196569603, 0.43839320352264155, 0.3866853445120933]
555
+ - task:
556
+ type: Classification
557
+ dataset:
558
+ type: mteb/mtop_domain
559
+ name: MTEB MTOPDomainClassification (fr)
560
+ config: fr
561
+ split: test
562
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
563
+ metrics:
564
+ - type: accuracy
565
+ value: 88.33072345756342
566
+ - type: f1
567
+ value: 88.11780476022122
568
+ - type: f1_weighted
569
+ value: 88.28188145087299
570
+ - task:
571
+ type: Classification
572
+ dataset:
573
+ type: mteb/mtop_intent
574
+ name: MTEB MTOPIntentClassification (fr)
575
+ config: fr
576
+ split: test
577
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
578
+ metrics:
579
+ - type: accuracy
580
+ value: 57.854682117131226
581
+ - type: f1
582
+ value: 41.121569078191996
583
+ - type: f1_weighted
584
+ value: 60.04845437480532
585
+ - task:
586
+ type: Classification
587
+ dataset:
588
+ type: mteb/masakhanews
589
+ name: MTEB MasakhaNEWSClassification (fra)
590
+ config: fra
591
+ split: test
592
+ revision: 18193f187b92da67168c655c9973a165ed9593dd
593
+ metrics:
594
+ - type: accuracy
595
+ value: 76.87203791469194
596
+ - type: f1
597
+ value: 72.94847557303437
598
+ - type: f1_weighted
599
+ value: 76.9128173959562
600
+ - task:
601
+ type: Clustering
602
+ dataset:
603
+ type: masakhane/masakhanews
604
+ name: MTEB MasakhaNEWSClusteringP2P (fra)
605
+ config: fra
606
+ split: test
607
+ revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
608
+ metrics:
609
+ - type: v_measure
610
+ value: 61.32006896333715
611
+ - type: v_measures
612
+ value: [1.0, 0.6446188396257355, 0.28995363026757603, 0.40898735994696084, 0.7224436183265853]
613
+ - task:
614
+ type: Clustering
615
+ dataset:
616
+ type: masakhane/masakhanews
617
+ name: MTEB MasakhaNEWSClusteringS2S (fra)
618
+ config: fra
619
+ split: test
620
+ revision: 8ccc72e69e65f40c70e117d8b3c08306bb788b60
621
+ metrics:
622
+ - type: v_measure
623
+ value: 60.509887123660256
624
+ - type: v_measures
625
+ value: [1.0, 0.022472587992562534, 0.4686320087689936, 0.811946141094871, 0.7224436183265853]
626
+ - task:
627
+ type: Classification
628
+ dataset:
629
+ type: mteb/amazon_massive_intent
630
+ name: MTEB MassiveIntentClassification (fr)
631
+ config: fr
632
+ split: test
633
+ revision: 4672e20407010da34463acc759c162ca9734bca6
634
+ metrics:
635
+ - type: accuracy
636
+ value: 64.14256893073302
637
+ - type: f1
638
+ value: 61.33068109342782
639
+ - type: f1_weighted
640
+ value: 62.74292948992287
641
+ - task:
642
+ type: Classification
643
+ dataset:
644
+ type: mteb/amazon_massive_scenario
645
+ name: MTEB MassiveScenarioClassification (fr)
646
+ config: fr
647
+ split: test
648
+ revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8
649
+ metrics:
650
+ - type: accuracy
651
+ value: 70.68930733019502
652
+ - type: f1
653
+ value: 70.26641874846638
654
+ - type: f1_weighted
655
+ value: 70.35250466465047
656
+ - task:
657
+ type: Retrieval
658
+ dataset:
659
+ type: jinaai/mintakaqa
660
+ name: MTEB MintakaRetrieval (fr)
661
+ config: fr
662
+ split: test
663
+ revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
664
+ metrics:
665
+ - type: map_at_1
666
+ value: 19.165
667
+ - type: map_at_10
668
+ value: 28.663
669
+ - type: map_at_100
670
+ value: 29.737000000000002
671
+ - type: map_at_1000
672
+ value: 29.826000000000004
673
+ - type: map_at_20
674
+ value: 29.266
675
+ - type: map_at_3
676
+ value: 26.024
677
+ - type: map_at_5
678
+ value: 27.486
679
+ - type: mrr_at_1
680
+ value: 19.164619164619165
681
+ - type: mrr_at_10
682
+ value: 28.66298116298116
683
+ - type: mrr_at_100
684
+ value: 29.737423308510476
685
+ - type: mrr_at_1000
686
+ value: 29.825744096186796
687
+ - type: mrr_at_20
688
+ value: 29.26593905045215
689
+ - type: mrr_at_3
690
+ value: 26.023751023751025
691
+ - type: mrr_at_5
692
+ value: 27.48566748566751
693
+ - type: nauc_map_at_1000_diff1
694
+ value: 23.682512151202967
695
+ - type: nauc_map_at_1000_max
696
+ value: 25.78708364723919
697
+ - type: nauc_map_at_100_diff1
698
+ value: 23.647360144907324
699
+ - type: nauc_map_at_100_max
700
+ value: 25.812420160707074
701
+ - type: nauc_map_at_10_diff1
702
+ value: 23.658224717435765
703
+ - type: nauc_map_at_10_max
704
+ value: 25.845198626323217
705
+ - type: nauc_map_at_1_diff1
706
+ value: 30.56830621718086
707
+ - type: nauc_map_at_1_max
708
+ value: 19.931526248650147
709
+ - type: nauc_map_at_20_diff1
710
+ value: 23.69662048930091
711
+ - type: nauc_map_at_20_max
712
+ value: 25.936653022318403
713
+ - type: nauc_map_at_3_diff1
714
+ value: 24.663221072349817
715
+ - type: nauc_map_at_3_max
716
+ value: 24.634011858800275
717
+ - type: nauc_map_at_5_diff1
718
+ value: 24.3650772668551
719
+ - type: nauc_map_at_5_max
720
+ value: 25.75222318469224
721
+ - type: nauc_mrr_at_1000_diff1
722
+ value: 23.682512151202967
723
+ - type: nauc_mrr_at_1000_max
724
+ value: 25.78708364723919
725
+ - type: nauc_mrr_at_100_diff1
726
+ value: 23.647360144907324
727
+ - type: nauc_mrr_at_100_max
728
+ value: 25.812420160707074
729
+ - type: nauc_mrr_at_10_diff1
730
+ value: 23.658224717435765
731
+ - type: nauc_mrr_at_10_max
732
+ value: 25.845198626323217
733
+ - type: nauc_mrr_at_1_diff1
734
+ value: 30.56830621718086
735
+ - type: nauc_mrr_at_1_max
736
+ value: 19.931526248650147
737
+ - type: nauc_mrr_at_20_diff1
738
+ value: 23.69662048930091
739
+ - type: nauc_mrr_at_20_max
740
+ value: 25.936653022318403
741
+ - type: nauc_mrr_at_3_diff1
742
+ value: 24.663221072349817
743
+ - type: nauc_mrr_at_3_max
744
+ value: 24.634011858800275
745
+ - type: nauc_mrr_at_5_diff1
746
+ value: 24.3650772668551
747
+ - type: nauc_mrr_at_5_max
748
+ value: 25.75222318469224
749
+ - type: nauc_ndcg_at_1000_diff1
750
+ value: 21.68690756038845
751
+ - type: nauc_ndcg_at_1000_max
752
+ value: 27.168575101114893
753
+ - type: nauc_ndcg_at_100_diff1
754
+ value: 20.484812648526646
755
+ - type: nauc_ndcg_at_100_max
756
+ value: 27.79987215383081
757
+ - type: nauc_ndcg_at_10_diff1
758
+ value: 20.791330920997765
759
+ - type: nauc_ndcg_at_10_max
760
+ value: 28.272774035036935
761
+ - type: nauc_ndcg_at_1_diff1
762
+ value: 30.56830621718086
763
+ - type: nauc_ndcg_at_1_max
764
+ value: 19.931526248650147
765
+ - type: nauc_ndcg_at_20_diff1
766
+ value: 20.88342749790573
767
+ - type: nauc_ndcg_at_20_max
768
+ value: 28.627184419546825
769
+ - type: nauc_ndcg_at_3_diff1
770
+ value: 22.987235018840494
771
+ - type: nauc_ndcg_at_3_max
772
+ value: 26.054144215976482
773
+ - type: nauc_ndcg_at_5_diff1
774
+ value: 22.497863289090464
775
+ - type: nauc_ndcg_at_5_max
776
+ value: 27.98879570850259
777
+ - type: nauc_precision_at_1000_diff1
778
+ value: -0.6707404502167996
779
+ - type: nauc_precision_at_1000_max
780
+ value: 31.987217077673346
781
+ - type: nauc_precision_at_100_diff1
782
+ value: 5.079765403021014
783
+ - type: nauc_precision_at_100_max
784
+ value: 34.857053312543194
785
+ - type: nauc_precision_at_10_diff1
786
+ value: 12.628771618059472
787
+ - type: nauc_precision_at_10_max
788
+ value: 35.009564954169896
789
+ - type: nauc_precision_at_1_diff1
790
+ value: 30.56830621718086
791
+ - type: nauc_precision_at_1_max
792
+ value: 19.931526248650147
793
+ - type: nauc_precision_at_20_diff1
794
+ value: 12.28251326261041
795
+ - type: nauc_precision_at_20_max
796
+ value: 36.942629359432075
797
+ - type: nauc_precision_at_3_diff1
798
+ value: 18.663775283519335
799
+ - type: nauc_precision_at_3_max
800
+ value: 29.741315837492472
801
+ - type: nauc_precision_at_5_diff1
802
+ value: 17.70442691217025
803
+ - type: nauc_precision_at_5_max
804
+ value: 33.93438470540527
805
+ - type: nauc_recall_at_1000_diff1
806
+ value: -0.6707404502171719
807
+ - type: nauc_recall_at_1000_max
808
+ value: 31.987217077672607
809
+ - type: nauc_recall_at_100_diff1
810
+ value: 5.079765403021056
811
+ - type: nauc_recall_at_100_max
812
+ value: 34.85705331254323
813
+ - type: nauc_recall_at_10_diff1
814
+ value: 12.628771618059483
815
+ - type: nauc_recall_at_10_max
816
+ value: 35.00956495416992
817
+ - type: nauc_recall_at_1_diff1
818
+ value: 30.56830621718086
819
+ - type: nauc_recall_at_1_max
820
+ value: 19.931526248650147
821
+ - type: nauc_recall_at_20_diff1
822
+ value: 12.282513262610411
823
+ - type: nauc_recall_at_20_max
824
+ value: 36.94262935943207
825
+ - type: nauc_recall_at_3_diff1
826
+ value: 18.663775283519346
827
+ - type: nauc_recall_at_3_max
828
+ value: 29.741315837492465
829
+ - type: nauc_recall_at_5_diff1
830
+ value: 17.704426912170252
831
+ - type: nauc_recall_at_5_max
832
+ value: 33.934384705405286
833
+ - type: ndcg_at_1
834
+ value: 19.165
835
+ - type: ndcg_at_10
836
+ value: 33.674
837
+ - type: ndcg_at_100
838
+ value: 39.297
839
+ - type: ndcg_at_1000
840
+ value: 41.896
841
+ - type: ndcg_at_20
842
+ value: 35.842
843
+ - type: ndcg_at_3
844
+ value: 28.238999999999997
845
+ - type: ndcg_at_5
846
+ value: 30.863000000000003
847
+ - type: precision_at_1
848
+ value: 19.165
849
+ - type: precision_at_10
850
+ value: 4.9590000000000005
851
+ - type: precision_at_100
852
+ value: 0.768
853
+ - type: precision_at_1000
854
+ value: 0.098
855
+ - type: precision_at_20
856
+ value: 2.905
857
+ - type: precision_at_3
858
+ value: 11.548
859
+ - type: precision_at_5
860
+ value: 8.198
861
+ - type: recall_at_1
862
+ value: 19.165
863
+ - type: recall_at_10
864
+ value: 49.59
865
+ - type: recall_at_100
866
+ value: 76.822
867
+ - type: recall_at_1000
868
+ value: 97.83
869
+ - type: recall_at_20
870
+ value: 58.108000000000004
871
+ - type: recall_at_3
872
+ value: 34.644000000000005
873
+ - type: recall_at_5
874
+ value: 40.991
875
+ - task:
876
+ type: PairClassification
877
+ dataset:
878
+ type: GEM/opusparcus
879
+ name: MTEB OpusparcusPC (fr)
880
+ config: fr
881
+ split: test
882
+ revision: 9e9b1f8ef51616073f47f306f7f47dd91663f86a
883
+ metrics:
884
+ - type: cos_sim_accuracy
885
+ value: 83.51498637602179
886
+ - type: cos_sim_ap
887
+ value: 94.18614574224773
888
+ - type: cos_sim_f1
889
+ value: 88.3564925730714
890
+ - type: cos_sim_precision
891
+ value: 85.37037037037037
892
+ - type: cos_sim_recall
893
+ value: 91.55908639523337
894
+ - type: dot_accuracy
895
+ value: 83.51498637602179
896
+ - type: dot_ap
897
+ value: 94.18614574224773
898
+ - type: dot_f1
899
+ value: 88.3564925730714
900
+ - type: dot_precision
901
+ value: 85.37037037037037
902
+ - type: dot_recall
903
+ value: 91.55908639523337
904
+ - type: euclidean_accuracy
905
+ value: 83.51498637602179
906
+ - type: euclidean_ap
907
+ value: 94.18614574224773
908
+ - type: euclidean_f1
909
+ value: 88.3564925730714
910
+ - type: euclidean_precision
911
+ value: 85.37037037037037
912
+ - type: euclidean_recall
913
+ value: 91.55908639523337
914
+ - type: manhattan_accuracy
915
+ value: 83.51498637602179
916
+ - type: manhattan_ap
917
+ value: 94.16717671332795
918
+ - type: manhattan_f1
919
+ value: 88.35418671799807
920
+ - type: manhattan_precision
921
+ value: 85.71428571428571
922
+ - type: manhattan_recall
923
+ value: 91.16186693147964
924
+ - type: max_accuracy
925
+ value: 83.51498637602179
926
+ - type: max_ap
927
+ value: 94.18614574224773
928
+ - type: max_f1
929
+ value: 88.3564925730714
930
+ - task:
931
+ type: PairClassification
932
+ dataset:
933
+ type: google-research-datasets/paws-x
934
+ name: MTEB PawsX (fr)
935
+ config: fr
936
+ split: test
937
+ revision: 8a04d940a42cd40658986fdd8e3da561533a3646
938
+ metrics:
939
+ - type: cos_sim_accuracy
940
+ value: 60.699999999999996
941
+ - type: cos_sim_ap
942
+ value: 60.20276173325004
943
+ - type: cos_sim_f1
944
+ value: 62.716429395921516
945
+ - type: cos_sim_precision
946
+ value: 48.05424528301887
947
+ - type: cos_sim_recall
948
+ value: 90.2547065337763
949
+ - type: dot_accuracy
950
+ value: 60.699999999999996
951
+ - type: dot_ap
952
+ value: 60.27996470746299
953
+ - type: dot_f1
954
+ value: 62.716429395921516
955
+ - type: dot_precision
956
+ value: 48.05424528301887
957
+ - type: dot_recall
958
+ value: 90.2547065337763
959
+ - type: euclidean_accuracy
960
+ value: 60.699999999999996
961
+ - type: euclidean_ap
962
+ value: 60.20276173325004
963
+ - type: euclidean_f1
964
+ value: 62.716429395921516
965
+ - type: euclidean_precision
966
+ value: 48.05424528301887
967
+ - type: euclidean_recall
968
+ value: 90.2547065337763
969
+ - type: manhattan_accuracy
970
+ value: 60.699999999999996
971
+ - type: manhattan_ap
972
+ value: 60.18010040913353
973
+ - type: manhattan_f1
974
+ value: 62.71056661562021
975
+ - type: manhattan_precision
976
+ value: 47.92276184903452
977
+ - type: manhattan_recall
978
+ value: 90.69767441860465
979
+ - type: max_accuracy
980
+ value: 60.699999999999996
981
+ - type: max_ap
982
+ value: 60.27996470746299
983
+ - type: max_f1
984
+ value: 62.716429395921516
985
+ - task:
986
+ type: STS
987
+ dataset:
988
+ type: Lajavaness/SICK-fr
989
+ name: MTEB SICKFr
990
+ config: default
991
+ split: test
992
+ revision: e077ab4cf4774a1e36d86d593b150422fafd8e8a
993
+ metrics:
994
+ - type: cos_sim_pearson
995
+ value: 84.24496945719946
996
+ - type: cos_sim_spearman
997
+ value: 78.10001513346513
998
+ - type: euclidean_pearson
999
+ value: 81.43570951228163
1000
+ - type: euclidean_spearman
1001
+ value: 78.0987784421045
1002
+ - type: manhattan_pearson
1003
+ value: 81.31986646517238
1004
+ - type: manhattan_spearman
1005
+ value: 78.09610194828534
1006
+ - task:
1007
+ type: STS
1008
+ dataset:
1009
+ type: mteb/sts22-crosslingual-sts
1010
+ name: MTEB STS22 (fr)
1011
+ config: fr
1012
+ split: test
1013
+ revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
1014
+ metrics:
1015
+ - type: cos_sim_pearson
1016
+ value: 83.07721141521425
1017
+ - type: cos_sim_spearman
1018
+ value: 83.19199466052186
1019
+ - type: euclidean_pearson
1020
+ value: 82.10672022294766
1021
+ - type: euclidean_spearman
1022
+ value: 83.19199466052186
1023
+ - type: manhattan_pearson
1024
+ value: 81.92531847793633
1025
+ - type: manhattan_spearman
1026
+ value: 83.20694689089673
1027
+ - task:
1028
+ type: STS
1029
+ dataset:
1030
+ type: mteb/stsb_multi_mt
1031
+ name: MTEB STSBenchmarkMultilingualSTS (fr)
1032
+ config: fr
1033
+ split: test
1034
+ revision: 29afa2569dcedaaa2fe6a3dcfebab33d28b82e8c
1035
+ metrics:
1036
+ - type: cos_sim_pearson
1037
+ value: 83.957481748094
1038
+ - type: cos_sim_spearman
1039
+ value: 84.40492503459248
1040
+ - type: euclidean_pearson
1041
+ value: 83.8150014101056
1042
+ - type: euclidean_spearman
1043
+ value: 84.40686653864509
1044
+ - type: manhattan_pearson
1045
+ value: 83.6816837321264
1046
+ - type: manhattan_spearman
1047
+ value: 84.2678486368702
1048
+ - task:
1049
+ type: Summarization
1050
+ dataset:
1051
+ type: lyon-nlp/summarization-summeval-fr-p2p
1052
+ name: MTEB SummEvalFr
1053
+ config: default
1054
+ split: test
1055
+ revision: b385812de6a9577b6f4d0f88c6a6e35395a94054
1056
+ metrics:
1057
+ - type: cos_sim_pearson
1058
+ value: 32.06592630917136
1059
+ - type: cos_sim_spearman
1060
+ value: 30.94878864229808
1061
+ - type: dot_pearson
1062
+ value: 32.06591974515864
1063
+ - type: dot_spearman
1064
+ value: 30.925383080565222
1065
+ - task:
1066
+ type: Reranking
1067
+ dataset:
1068
+ type: lyon-nlp/mteb-fr-reranking-syntec-s2p
1069
+ name: MTEB SyntecReranking
1070
+ config: default
1071
+ split: test
1072
+ revision: daf0863838cd9e3ba50544cdce3ac2b338a1b0ad
1073
+ metrics:
1074
+ - type: map
1075
+ value: 88.11666666666667
1076
+ - type: mrr
1077
+ value: 88.11666666666667
1078
+ - type: nAUC_map_diff1
1079
+ value: 66.27779227667267
1080
+ - type: nAUC_map_max
1081
+ value: 6.651414764738896
1082
+ - type: nAUC_mrr_diff1
1083
+ value: 66.27779227667267
1084
+ - type: nAUC_mrr_max
1085
+ value: 6.651414764738896
1086
+ - task:
1087
+ type: Retrieval
1088
+ dataset:
1089
+ type: lyon-nlp/mteb-fr-retrieval-syntec-s2p
1090
+ name: MTEB SyntecRetrieval
1091
+ config: default
1092
+ split: test
1093
+ revision: 19661ccdca4dfc2d15122d776b61685f48c68ca9
1094
+ metrics:
1095
+ - type: map_at_1
1096
+ value: 69.0
1097
+ - type: map_at_10
1098
+ value: 80.65
1099
+ - type: map_at_100
1100
+ value: 80.838
1101
+ - type: map_at_1000
1102
+ value: 80.838
1103
+ - type: map_at_20
1104
+ value: 80.838
1105
+ - type: map_at_3
1106
+ value: 79.833
1107
+ - type: map_at_5
1108
+ value: 80.483
1109
+ - type: mrr_at_1
1110
+ value: 69.0
1111
+ - type: mrr_at_10
1112
+ value: 80.64999999999999
1113
+ - type: mrr_at_100
1114
+ value: 80.83799019607844
1115
+ - type: mrr_at_1000
1116
+ value: 80.83799019607844
1117
+ - type: mrr_at_20
1118
+ value: 80.83799019607844
1119
+ - type: mrr_at_3
1120
+ value: 79.83333333333334
1121
+ - type: mrr_at_5
1122
+ value: 80.48333333333333
1123
+ - type: nauc_map_at_1000_diff1
1124
+ value: 61.46904865740055
1125
+ - type: nauc_map_at_1000_max
1126
+ value: 24.307826758747282
1127
+ - type: nauc_map_at_100_diff1
1128
+ value: 61.46904865740055
1129
+ - type: nauc_map_at_100_max
1130
+ value: 24.307826758747282
1131
+ - type: nauc_map_at_10_diff1
1132
+ value: 61.094194035098035
1133
+ - type: nauc_map_at_10_max
1134
+ value: 24.44687875369869
1135
+ - type: nauc_map_at_1_diff1
1136
+ value: 65.17628798701865
1137
+ - type: nauc_map_at_1_max
1138
+ value: 25.79501560929155
1139
+ - type: nauc_map_at_20_diff1
1140
+ value: 61.46904865740055
1141
+ - type: nauc_map_at_20_max
1142
+ value: 24.307826758747282
1143
+ - type: nauc_map_at_3_diff1
1144
+ value: 61.562719756100805
1145
+ - type: nauc_map_at_3_max
1146
+ value: 25.87804164282553
1147
+ - type: nauc_map_at_5_diff1
1148
+ value: 61.471976470716264
1149
+ - type: nauc_map_at_5_max
1150
+ value: 25.180513270581322
1151
+ - type: nauc_mrr_at_1000_diff1
1152
+ value: 61.46904865740055
1153
+ - type: nauc_mrr_at_1000_max
1154
+ value: 24.307826758747282
1155
+ - type: nauc_mrr_at_100_diff1
1156
+ value: 61.46904865740055
1157
+ - type: nauc_mrr_at_100_max
1158
+ value: 24.307826758747282
1159
+ - type: nauc_mrr_at_10_diff1
1160
+ value: 61.094194035098035
1161
+ - type: nauc_mrr_at_10_max
1162
+ value: 24.44687875369869
1163
+ - type: nauc_mrr_at_1_diff1
1164
+ value: 65.17628798701865
1165
+ - type: nauc_mrr_at_1_max
1166
+ value: 25.79501560929155
1167
+ - type: nauc_mrr_at_20_diff1
1168
+ value: 61.46904865740055
1169
+ - type: nauc_mrr_at_20_max
1170
+ value: 24.307826758747282
1171
+ - type: nauc_mrr_at_3_diff1
1172
+ value: 61.562719756100805
1173
+ - type: nauc_mrr_at_3_max
1174
+ value: 25.87804164282553
1175
+ - type: nauc_mrr_at_5_diff1
1176
+ value: 61.471976470716264
1177
+ - type: nauc_mrr_at_5_max
1178
+ value: 25.180513270581322
1179
+ - type: nauc_ndcg_at_1000_diff1
1180
+ value: 60.95477865546023
1181
+ - type: nauc_ndcg_at_1000_max
1182
+ value: 24.427553593893535
1183
+ - type: nauc_ndcg_at_100_diff1
1184
+ value: 60.95477865546023
1185
+ - type: nauc_ndcg_at_100_max
1186
+ value: 24.427553593893535
1187
+ - type: nauc_ndcg_at_10_diff1
1188
+ value: 59.101673931307396
1189
+ - type: nauc_ndcg_at_10_max
1190
+ value: 25.01155211084955
1191
+ - type: nauc_ndcg_at_1_diff1
1192
+ value: 65.17628798701865
1193
+ - type: nauc_ndcg_at_1_max
1194
+ value: 25.79501560929155
1195
+ - type: nauc_ndcg_at_20_diff1
1196
+ value: 60.95477865546023
1197
+ - type: nauc_ndcg_at_20_max
1198
+ value: 24.427553593893535
1199
+ - type: nauc_ndcg_at_3_diff1
1200
+ value: 60.333057480044616
1201
+ - type: nauc_ndcg_at_3_max
1202
+ value: 28.363238330232637
1203
+ - type: nauc_ndcg_at_5_diff1
1204
+ value: 60.15511994533307
1205
+ - type: nauc_ndcg_at_5_max
1206
+ value: 26.94308058940176
1207
+ - type: nauc_precision_at_1000_diff1
1208
+ value: nan
1209
+ - type: nauc_precision_at_1000_max
1210
+ value: nan
1211
+ - type: nauc_precision_at_100_diff1
1212
+ value: nan
1213
+ - type: nauc_precision_at_100_max
1214
+ value: nan
1215
+ - type: nauc_precision_at_10_diff1
1216
+ value: 26.657329598506518
1217
+ - type: nauc_precision_at_10_max
1218
+ value: 34.26704014939361
1219
+ - type: nauc_precision_at_1_diff1
1220
+ value: 65.17628798701865
1221
+ - type: nauc_precision_at_1_max
1222
+ value: 25.79501560929155
1223
+ - type: nauc_precision_at_20_diff1
1224
+ value: 100.0
1225
+ - type: nauc_precision_at_20_max
1226
+ value: 100.0
1227
+ - type: nauc_precision_at_3_diff1
1228
+ value: 51.834066960117276
1229
+ - type: nauc_precision_at_3_max
1230
+ value: 48.25930372148875
1231
+ - type: nauc_precision_at_5_diff1
1232
+ value: 44.992997198879706
1233
+ - type: nauc_precision_at_5_max
1234
+ value: 50.70028011204499
1235
+ - type: nauc_recall_at_1000_diff1
1236
+ value: nan
1237
+ - type: nauc_recall_at_1000_max
1238
+ value: nan
1239
+ - type: nauc_recall_at_100_diff1
1240
+ value: nan
1241
+ - type: nauc_recall_at_100_max
1242
+ value: nan
1243
+ - type: nauc_recall_at_10_diff1
1244
+ value: 26.657329598505903
1245
+ - type: nauc_recall_at_10_max
1246
+ value: 34.26704014939303
1247
+ - type: nauc_recall_at_1_diff1
1248
+ value: 65.17628798701865
1249
+ - type: nauc_recall_at_1_max
1250
+ value: 25.79501560929155
1251
+ - type: nauc_recall_at_20_diff1
1252
+ value: nan
1253
+ - type: nauc_recall_at_20_max
1254
+ value: nan
1255
+ - type: nauc_recall_at_3_diff1
1256
+ value: 51.834066960117376
1257
+ - type: nauc_recall_at_3_max
1258
+ value: 48.25930372148865
1259
+ - type: nauc_recall_at_5_diff1
1260
+ value: 44.99299719887955
1261
+ - type: nauc_recall_at_5_max
1262
+ value: 50.70028011204488
1263
+ - type: ndcg_at_1
1264
+ value: 69.0
1265
+ - type: ndcg_at_10
1266
+ value: 84.786
1267
+ - type: ndcg_at_100
1268
+ value: 85.521
1269
+ - type: ndcg_at_1000
1270
+ value: 85.521
1271
+ - type: ndcg_at_20
1272
+ value: 85.521
1273
+ - type: ndcg_at_3
1274
+ value: 83.226
1275
+ - type: ndcg_at_5
1276
+ value: 84.43
1277
+ - type: precision_at_1
1278
+ value: 69.0
1279
+ - type: precision_at_10
1280
+ value: 9.700000000000001
1281
+ - type: precision_at_100
1282
+ value: 1.0
1283
+ - type: precision_at_1000
1284
+ value: 0.1
1285
+ - type: precision_at_20
1286
+ value: 5.0
1287
+ - type: precision_at_3
1288
+ value: 31.0
1289
+ - type: precision_at_5
1290
+ value: 19.2
1291
+ - type: recall_at_1
1292
+ value: 69.0
1293
+ - type: recall_at_10
1294
+ value: 97.0
1295
+ - type: recall_at_100
1296
+ value: 100.0
1297
+ - type: recall_at_1000
1298
+ value: 100.0
1299
+ - type: recall_at_20
1300
+ value: 100.0
1301
+ - type: recall_at_3
1302
+ value: 93.0
1303
+ - type: recall_at_5
1304
+ value: 96.0
1305
+ - task:
1306
+ type: Retrieval
1307
+ dataset:
1308
+ type: jinaai/xpqa
1309
+ name: MTEB XPQARetrieval (fr)
1310
+ config: fr
1311
+ split: test
1312
+ revision: c99d599f0a6ab9b85b065da6f9d94f9cf731679f
1313
+ metrics:
1314
+ - type: map_at_1
1315
+ value: 40.797
1316
+ - type: map_at_10
1317
+ value: 62.71099999999999
1318
+ - type: map_at_100
1319
+ value: 64.261
1320
+ - type: map_at_1000
1321
+ value: 64.306
1322
+ - type: map_at_20
1323
+ value: 63.693
1324
+ - type: map_at_3
1325
+ value: 56.686
1326
+ - type: map_at_5
1327
+ value: 60.653999999999996
1328
+ - type: mrr_at_1
1329
+ value: 64.08544726301736
1330
+ - type: mrr_at_10
1331
+ value: 71.24790726259349
1332
+ - type: mrr_at_100
1333
+ value: 71.7835679704396
1334
+ - type: mrr_at_1000
1335
+ value: 71.79095567140973
1336
+ - type: mrr_at_20
1337
+ value: 71.5854708410262
1338
+ - type: mrr_at_3
1339
+ value: 69.55941255006672
1340
+ - type: mrr_at_5
1341
+ value: 70.60747663551396
1342
+ - type: nauc_map_at_1000_diff1
1343
+ value: 47.803181417639365
1344
+ - type: nauc_map_at_1000_max
1345
+ value: 51.22073368230412
1346
+ - type: nauc_map_at_100_diff1
1347
+ value: 47.771573391555755
1348
+ - type: nauc_map_at_100_max
1349
+ value: 51.20370234778812
1350
+ - type: nauc_map_at_10_diff1
1351
+ value: 47.340833389771625
1352
+ - type: nauc_map_at_10_max
1353
+ value: 50.41256517180715
1354
+ - type: nauc_map_at_1_diff1
1355
+ value: 55.14983744702445
1356
+ - type: nauc_map_at_1_max
1357
+ value: 31.104750896985728
1358
+ - type: nauc_map_at_20_diff1
1359
+ value: 47.64026863999484
1360
+ - type: nauc_map_at_20_max
1361
+ value: 50.87670909266768
1362
+ - type: nauc_map_at_3_diff1
1363
+ value: 47.681906747352635
1364
+ - type: nauc_map_at_3_max
1365
+ value: 43.47246277661219
1366
+ - type: nauc_map_at_5_diff1
1367
+ value: 46.874943002794815
1368
+ - type: nauc_map_at_5_max
1369
+ value: 48.469495140739724
1370
+ - type: nauc_mrr_at_1000_diff1
1371
+ value: 57.34098736669957
1372
+ - type: nauc_mrr_at_1000_max
1373
+ value: 60.179095583193444
1374
+ - type: nauc_mrr_at_100_diff1
1375
+ value: 57.339862158018796
1376
+ - type: nauc_mrr_at_100_max
1377
+ value: 60.18082273539442
1378
+ - type: nauc_mrr_at_10_diff1
1379
+ value: 57.210874058908814
1380
+ - type: nauc_mrr_at_10_max
1381
+ value: 60.043680803697086
1382
+ - type: nauc_mrr_at_1_diff1
1383
+ value: 59.69074056197331
1384
+ - type: nauc_mrr_at_1_max
1385
+ value: 60.90082316300324
1386
+ - type: nauc_mrr_at_20_diff1
1387
+ value: 57.35434243512763
1388
+ - type: nauc_mrr_at_20_max
1389
+ value: 60.18873377253912
1390
+ - type: nauc_mrr_at_3_diff1
1391
+ value: 57.26933631425754
1392
+ - type: nauc_mrr_at_3_max
1393
+ value: 60.05458089795687
1394
+ - type: nauc_mrr_at_5_diff1
1395
+ value: 57.045411517214276
1396
+ - type: nauc_mrr_at_5_max
1397
+ value: 59.981421712413685
1398
+ - type: nauc_ndcg_at_1000_diff1
1399
+ value: 50.232929738614814
1400
+ - type: nauc_ndcg_at_1000_max
1401
+ value: 55.01594185277396
1402
+ - type: nauc_ndcg_at_100_diff1
1403
+ value: 49.876825728406786
1404
+ - type: nauc_ndcg_at_100_max
1405
+ value: 54.87898182661215
1406
+ - type: nauc_ndcg_at_10_diff1
1407
+ value: 48.40787615482867
1408
+ - type: nauc_ndcg_at_10_max
1409
+ value: 52.84877289626636
1410
+ - type: nauc_ndcg_at_1_diff1
1411
+ value: 59.69074056197331
1412
+ - type: nauc_ndcg_at_1_max
1413
+ value: 60.90082316300324
1414
+ - type: nauc_ndcg_at_20_diff1
1415
+ value: 49.08453974591539
1416
+ - type: nauc_ndcg_at_20_max
1417
+ value: 53.80319392912378
1418
+ - type: nauc_ndcg_at_3_diff1
1419
+ value: 48.21830414023458
1420
+ - type: nauc_ndcg_at_3_max
1421
+ value: 51.321799626032714
1422
+ - type: nauc_ndcg_at_5_diff1
1423
+ value: 47.614495954542605
1424
+ - type: nauc_ndcg_at_5_max
1425
+ value: 50.803800463597405
1426
+ - type: nauc_precision_at_1000_diff1
1427
+ value: -15.87250509394414
1428
+ - type: nauc_precision_at_1000_max
1429
+ value: 16.09830137145176
1430
+ - type: nauc_precision_at_100_diff1
1431
+ value: -13.720930651556534
1432
+ - type: nauc_precision_at_100_max
1433
+ value: 19.94363871765946
1434
+ - type: nauc_precision_at_10_diff1
1435
+ value: -3.9626074014054136
1436
+ - type: nauc_precision_at_10_max
1437
+ value: 30.48732389685921
1438
+ - type: nauc_precision_at_1_diff1
1439
+ value: 59.69074056197331
1440
+ - type: nauc_precision_at_1_max
1441
+ value: 60.90082316300324
1442
+ - type: nauc_precision_at_20_diff1
1443
+ value: -8.144148640034853
1444
+ - type: nauc_precision_at_20_max
1445
+ value: 26.183545158653338
1446
+ - type: nauc_precision_at_3_diff1
1447
+ value: 7.1166818076254605
1448
+ - type: nauc_precision_at_3_max
1449
+ value: 37.64665636029093
1450
+ - type: nauc_precision_at_5_diff1
1451
+ value: 0.3455996928663316
1452
+ - type: nauc_precision_at_5_max
1453
+ value: 34.95245204298077
1454
+ - type: nauc_recall_at_1000_diff1
1455
+ value: 47.93171740380228
1456
+ - type: nauc_recall_at_1000_max
1457
+ value: 89.21354057542635
1458
+ - type: nauc_recall_at_100_diff1
1459
+ value: 34.93973412699365
1460
+ - type: nauc_recall_at_100_max
1461
+ value: 47.89216950421148
1462
+ - type: nauc_recall_at_10_diff1
1463
+ value: 38.58556368247737
1464
+ - type: nauc_recall_at_10_max
1465
+ value: 45.13227163006313
1466
+ - type: nauc_recall_at_1_diff1
1467
+ value: 55.14983744702445
1468
+ - type: nauc_recall_at_1_max
1469
+ value: 31.104750896985728
1470
+ - type: nauc_recall_at_20_diff1
1471
+ value: 38.53568097509877
1472
+ - type: nauc_recall_at_20_max
1473
+ value: 46.37328875121808
1474
+ - type: nauc_recall_at_3_diff1
1475
+ value: 41.49659886305561
1476
+ - type: nauc_recall_at_3_max
1477
+ value: 38.59476562231703
1478
+ - type: nauc_recall_at_5_diff1
1479
+ value: 38.489499442628016
1480
+ - type: nauc_recall_at_5_max
1481
+ value: 43.06848825600403
1482
+ - type: ndcg_at_1
1483
+ value: 64.08500000000001
1484
+ - type: ndcg_at_10
1485
+ value: 68.818
1486
+ - type: ndcg_at_100
1487
+ value: 73.66
1488
+ - type: ndcg_at_1000
1489
+ value: 74.309
1490
+ - type: ndcg_at_20
1491
+ value: 71.147
1492
+ - type: ndcg_at_3
1493
+ value: 64.183
1494
+ - type: ndcg_at_5
1495
+ value: 65.668
1496
+ - type: precision_at_1
1497
+ value: 64.08500000000001
1498
+ - type: precision_at_10
1499
+ value: 15.728
1500
+ - type: precision_at_100
1501
+ value: 1.9720000000000002
1502
+ - type: precision_at_1000
1503
+ value: 0.207
1504
+ - type: precision_at_20
1505
+ value: 8.705
1506
+ - type: precision_at_3
1507
+ value: 39.03
1508
+ - type: precision_at_5
1509
+ value: 27.717000000000002
1510
+ - type: recall_at_1
1511
+ value: 40.797
1512
+ - type: recall_at_10
1513
+ value: 77.432
1514
+ - type: recall_at_100
1515
+ value: 95.68100000000001
1516
+ - type: recall_at_1000
1517
+ value: 99.666
1518
+ - type: recall_at_20
1519
+ value: 84.773
1520
+ - type: recall_at_3
1521
+ value: 62.083
1522
+ - type: recall_at_5
1523
+ value: 69.786
1524
  license: apache-2.0
1525
  language:
1526
  - fr
1527
  - en
 
 
 
1528
  ---
1529
  ## Model Description:
1530
+ [**french-document-embedding**](https://huggingface.co/dangvantuan/french-document-embedding) is an embedding model for documents in the French-English language, with a context length of up to 8096 tokens. This model is a specialized text-embedding model trained specifically for the French-English language. It is built upon [gte-multilingual](Alibaba-NLP/gte-multilingual-base) and trained using the [SimilarityLoss], [Multi-Negative Ranking Loss](https://arxiv.org/abs/1705.00652), [Matryoshka2dLoss](https://arxiv.org/html/2402.14776v1) and [GISTEmbedLoss](https://arxiv.org/abs/2402.16829) using [guide model](https://huggingface.co/Lajavaness/bilingual-embedding-large). This model embeds and converts long texts or documents into vectors with 786 dimensions, making it useful for vector databases serving semantic search or RAG (Retrieval-Augmented Generation).
1531
 
1532
  ## Full Model Architecture
1533
  ```
 
1537
  (2): Normalize()
1538
  )
1539
  ```
 
 
 
 
 
 
 
 
1540
 
1541
 
1542
  ## Usage:
 
1555
 
1556
 
1557
 
1558
+ model = SentenceTransformer('dangvantuan/french-document-embedding', trust_remote_code=True)
1559
  embeddings = model.encode(sentences)
1560
  print(embeddings)
1561
 
 
1579
  year={2019}
1580
  }
1581
 
 
1582
  @article{zhang2024mgte,
1583
  title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval},
1584
  author={Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and others},
 
1598
  author={Li, Xianming and Li, Zongxi and Li, Jing and Xie, Haoran and Li, Qing},
1599
  journal={arXiv preprint arXiv:2402.14776},
1600
  year={2024}
1601
+ }
1602
+
1603
+ @misc{henderson2017efficient,
1604
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1605
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1606
+ year={2017},
1607
+ eprint={1705.00652},
1608
+ archivePrefix={arXiv},
1609
+ primaryClass={cs.CL}
1610
+ }
1611
+
1612
+ @misc{solatorio2024gistembed,
1613
+ title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
1614
+ author={Aivin V. Solatorio},
1615
+ year={2024},
1616
+ eprint={2402.16829},
1617
+ archivePrefix={arXiv},
1618
+ primaryClass={cs.LG}
1619
  }