Spaces:
Running
on
CPU Upgrade
Models trained on commercial-use-friendly licensed data
Suppose if one is faced with a strict guideline of using models which are licensed so that their commercial use is allowed (there are many such models we can see on the leader board) but more importantly, models which have been trained on data that is also licensed allowing commercial use. This rules out models such as all-MiniLM-L6-v2 or the all-mpnet-base-v2 as their training data includes some datasets which are only allowed for research/academic use. Similarly, the e5 model paper reports use of the Common Crawl which is legally grey. Has someone encountered this kind of a scenario and found a model that satisfies both these model and training data license constraints? Any guidance in this regard will be valuable. Thank you.
Thanks for starting the discussion @nramrakhiyani ,we do not accept issues here, but you are free to open one here: https://github.com/embeddings-benchmark/mteb/issues
(I think it is a worthwhile discussion to bring up)