<thinking> is the proper tag?

#8
by McUH - opened

I had problems getting this model to work (IQ4_XS quant) using <think></think> and <answer></answer> as it would almost never enter thinking phase. Until I tried to use <thinking></thinking> instead and then it suddenly started to work well. Also it often forgets to insert the <thinking> by itself and starts answering directly instead of thinking, so I added prefill to last assistant prefix (SillyTavern):

<|im_start|>assistant
<thinking>

And this way it works very well even with roleplay (complicated system prompts with character/scenario descriptions). For now I do not delete <thinking> passages from previous responses and it seems to be working well like this and improves roleplay responses quite a lot as the character thinks about what to answer and so keeps it more to the point.

same issue

@McUH can you please tell me what's the right way to prompt? What do you mean using think tags, I assumed the model should automatically generate the think tags without them being present in the prompt.

I don't know what is supposed correct prompt, but I assume official would be Deepseek R1, which I think is something like:

<|begin▁of▁sentence|><|User|>What is 1+1?<|Assistant|>It's 2.<|end▁of▁sentence|><|User|>Explain more!<|Assistant|>

That said, I have also seen some CHATML/R1 prompts being suggested, which in some cases worked better for me.

As for <think></think> yes, in theory model should generate it itself to start thinking. However that only seems to work with simpler prompts. I wanted to use it also in long / complicated chats (including RP but also in general) to think before producing reply. However, in these situations the model would basically never generate the <think> tag and enter thinking phase. SO to force it, I prefill LLM answer with tag <think>. Only, with this tag, this model (L3 distill) almost never thinks anyway. But if I prefill with <thinking> instead, it does think, closes with </thinking> and then produces reply enclosed in <answer></answer>.

UPDATE: Now I tried also with L3 instruct tags + thinking system prompt + <thinking> prefill and that works very well too.

Ok, I think I finally figured it out. In chats I generally include names (using Sillytavern) as it makes the model less confused, especially if there are more characters (eg group chats). Problem is, that Sillytavern appends name after last instruction sequence, so even with prefill it is like this:

<think>CHARNAME: ... and here LLM starts generating ...

And this seems not to work with this L3 distillation (but works with <thinking>). Now I modified template to use "Start Reply With" prefill instead of last instruction sequence, so prompt ends like this:

CHARNAME:<think> ... and here LLM starts generating ...

And this works with both <think> and <thinking>. In general <think> produces more robust thinking (DeepseekR1 like) while <thinking> produces shorter, more concise thinking, but larger and more robust than non-distilled L3.3. So actually both have their uses. <think> is probably more of what is intended with distillation, while <thinking> kind of takes properties of both models (L3.3 and Deepseek) into account.

Sign up or log in to comment