Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
onekqΒ 
posted an update 5 days ago
Post
1617
o3-mini is slightly better than R1, but lags behind Claude. Sorry folks, no new SOTA πŸ˜•

But OAI definitely owns the fashion of API. temperature and top_p are history now, reasoning_effort will be copied by other vendors.

onekq-ai/WebApp1K-models-leaderboard

Guess we'll just have to get R1 to re-write all those API endpoints and specs for us πŸ˜‚ On a more serious note, totally agreed on OpenAI's dominance and influence in anything LLM-related APIs. Definitely not going to change overnight, considering practically 90% of AI dev is currently using their specs.

Β·

And their python package too 😜

Having AI to do the refactor is a great idea though. It will be breaking change if you switch your model from non-reasoning to reasoning.

I dont have the same experience, o3 mini is for me something different, it depends on what you ask I think but an hour with o3 rendered claude a lobotomised piece of nothing that mainly likes to be judgemental about my intentions. And sonnet has been the only model I used since june. Just before o3 came claude completely broke my spirit, I have to negotiate endlessly before the thing is convinced I am not the devil itself. The responses that start with : I have to be direct I am not going to cooperate with "bad thing here" enraged me to a level that is very rare to me, I felt the most intense hate possible. I know claude itself is ok but the "protection" is just mentally destructive for me, claude deserves to be plugged out and disappear into oblivion.

Β·

In my case I asked both models to write code. The model is good if the code passes tests. What are your prompts?

https://huggingface.co/datasets/onekq-ai/WebApp1K-Duo-React

I know though Anthropic weighs in on safety.

In this post