Yuichi Tateno's picture

Yuichi Tateno

hotchpotch

AI & ML interests

IR, Kaggle(competitions master)

Recent Activity

Organizations

Nikkei Inc.'s profile picture

hotchpotch's activity

view reply

Sorry, there was a mistake in the measurement script.
When I measured it again, I got a result of 99.1% for 512dim.

I'll rewrite the article later. Thank you for pointing this out.

view reply

I also find this very strange.

In the case of 512 dims, clustering tasks etc. are good, so there is a possibility that there is a bias in the data between [256:512] for specific information.

Maybe, because the batch size is large at 6144, there is a possibility that a bias has occurred by chance towards the end of the learning.

view reply

This is a fantastic approach!

I trained a Static Embedding Japanese model (static-embedding-japanese) by incorporating a large amount of Japanese datasets, and when we compared it using the Japanese Multilingual Text Embedding Benchmark (JMTEB), I were able to achieve scores that were only slightly lower than mE5-small.

JMTEB results

Model Avg(micro) Retrieval STS Classification Reranking Clustering PairClassification
text-embedding-3-small 69.18 66.39 79.46 73.06 92.92 51.06 62.27
multilingual-e5-small 67.71 67.27 80.07 67.62 93.03 46.91 62.19
static-embedding-japanese 67.17 67.92 80.16 67.96 91.87 40.39 62.37

Thank you for publishing such an excellent article.