How to turn off the r1 mode when running it with huggingface api?

#9
by securealex - opened

The output is very long and includes the model's thought process. This results in unnecessarily lengthy responses that are often not usable.

Same. If only the model could think in brief.

Just an opinion but the only way is for you to prompt it not to think (unlikely that it will work) or to finetune to your dataset. When the model does not do any reason it outputs Answer... You should replicate this behavior with your data

I'm also having this issue. It takes a long time to run the model on my hardware, but if I set max_tokens lower then all I get is 5-6 paragraphs of 'thinking' and no actual answer. Surely there is a way to 'turn this off'?

Please let me know if anyone comes up with a decent idea! :)

Don't use a reasoning model then

Sign up or log in to comment