Models trained on commercial-use-friendly licensed data

#149
by nramrakhiyani - opened

Suppose if one is faced with a strict guideline of using models which are licensed so that their commercial use is allowed (there are many such models we can see on the leader board) but more importantly, models which have been trained on data that is also licensed allowing commercial use. This rules out models such as all-MiniLM-L6-v2 or the all-mpnet-base-v2 as their training data includes some datasets which are only allowed for research/academic use. Similarly, the e5 model paper reports use of the Common Crawl which is legally grey. Has someone encountered this kind of a scenario and found a model that satisfies both these model and training data license constraints? Any guidance in this regard will be valuable. Thank you.

nramrakhiyani changed discussion title from Models trained on commercial use friendly licensed data to Models trained on commercial-use-friendly licensed data
Massive Text Embedding Benchmark org

Thanks for starting the discussion @nramrakhiyani ,we do not accept issues here, but you are free to open one here: https://github.com/embeddings-benchmark/mteb/issues

(I think it is a worthwhile discussion to bring up)

KennethEnevoldsen changed discussion status to closed

Sign up or log in to comment