@onekq on Hugging Face: "o3-mini is slightly better than R1, but lags behind Claude. Sorry folks, no…"

onekq

posted an update 5 days ago

Post

1617

o3-mini is slightly better than R1, but lags behind Claude. Sorry folks, no new SOTA 😕

But OAI definitely owns the fashion of API. temperature and top_p are history now, reasoning_effort will be copied by other vendors.

onekq-ai/WebApp1K-models-leaderboard

LPX55

5 days ago

•

edited 5 days ago

Guess we'll just have to get R1 to re-write all those API endpoints and specs for us 😂 On a more serious note, totally agreed on OpenAI's dominance and influence in anything LLM-related APIs. Definitely not going to change overnight, considering practically 90% of AI dev is currently using their specs.

onekq

5 days ago

And their python package too 😜

Having AI to do the refactor is a great idea though. It will be breaking change if you switch your model from non-reasoning to reasoning.

DB2323

4 days ago

•

edited 4 days ago

I dont have the same experience, o3 mini is for me something different, it depends on what you ask I think but an hour with o3 rendered claude a lobotomised piece of nothing that mainly likes to be judgemental about my intentions. And sonnet has been the only model I used since june. Just before o3 came claude completely broke my spirit, I have to negotiate endlessly before the thing is convinced I am not the devil itself. The responses that start with : I have to be direct I am not going to cooperate with "bad thing here" enraged me to a level that is very rare to me, I felt the most intense hate possible. I know claude itself is ok but the "protection" is just mentally destructive for me, claude deserves to be plugged out and disappear into oblivion.

onekq

3 days ago

In my case I asked both models to write code. The model is good if the code passes tests. What are your prompts?

https://huggingface.co/datasets/onekq-ai/WebApp1K-Duo-React

I know though Anthropic weighs in on safety.

Join the conversation