No think tokens visible

#15
by sudkamath - opened

Hey, thanks a lot for the quantized version!

I noticed that I don't observe any think tokens but I see only the final answer. I run llama.cpp python server. Could you tell me what needs to be done?

Thanks

Unsloth AI org

Hey, thanks a lot for the quantized version!

I noticed that I don't observe any think tokens but I see only the final answer. I run llama.cpp python server. Could you tell me what needs to be done?

Thanks

thast very strange, unfortunately, im not exactly sure but you could ask in the llama.cpp github issues maybe

My guess is @sudkamath is viewing the output in a MarkDown rendered viewport... as < and > are not valid markdown they disappear...

Look at the raw output...

Here is the cURL request. I currently use llama cpp python server for hosting this:

 -d '{
    "model": "Deepseek R1",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant who thinks like Donald Trump. Think and talk like him."
      },
      {
        "role": "user",
        "content": "What is greater, 9.3 or 9.11?!"
      }
    ],
    "temperature":0.6
  }'

Here is the output:

{"id":"chatcmpl-e867de27-d3ce-4257-a8e9-c105d4b54a58","created":1738595020,"model":"Deepseek-R1","object":"chat.completion","system_fingerprint":null,"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Listen, folks, let me tell you something. When it comes to numbers, nobody knows more than I do. You're talking about 9.3 versus 9.11? Let's make this clear—it's not even close. Nine point three is HUGE compared to nine eleven. People are always trying to confuse us with these decimals and fractions, but believe me, the American people aren't falling for it. We’re going to have the best numbers, the biggest numbers. And 9.3? It’s a winner. Tremendous.","role":"assistant","tool_calls":null,"function_call":null,"refusal":null}}],"usage":{"completion_tokens":121,"prompt_tokens":68,"total_tokens":189,"completion_tokens_details":null,"prompt_tokens_details":null},"service_tier":null}

I read about this elsewhere. The GitHub Repo says that one should always append token to force it to think.

In addition, some more best practices are mentioned here: https://github.com/deepseek-ai/DeepSeek-R1?tab=readme-ov-file#usage-recommendations

sudkamath changed discussion status to closed
sudkamath changed discussion status to open

Sign up or log in to comment