chat template doesn't include tools

#3
by copasseron - opened

Hi mistral team,

nice to see a new model from you guys, thanks a lot.

https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501/blob/main/tokenizer_config.json#L9010

in the jinja chat template we don't have anything related to tool (not available nor to put the tools result in the history of messages sent to the model), is it intended?

ollama do include it on their side:

https://ollama.com/library/mistral-small/blobs/5de2b8ebfbdd

{{- range $index, $_ := .Messages }}
{{- if eq .Role "system" }}[SYSTEM_PROMPT] {{ .Content }}[/SYSTEM_PROMPT]
{{- else if eq .Role "user" }}
{{- if and (le (len (slice $.Messages $index)) 2) $.Tools }}[AVAILABLE_TOOLS] {{ $.Tools }}[/AVAILABLE_TOOLS]
{{- end }}[INST] {{ .Content }}[/INST]
{{- else if eq .Role "assistant" }}
{{- if .Content }} {{ .Content }}
{{- if not (eq (len (slice $.Messages $index)) 1) }}</s>
{{- end }}
{{- else if .ToolCalls }}[TOOL_CALLS] [
{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{- end }}]</s>
{{- end }}
{{- else if eq .Role "tool" }}[TOOL_RESULTS] {"content": {{ .Content }}}[/TOOL_RESULTS]
{{- end }}
{{- end }}

thanks a lot

Mistral AI_ org

We've tested function calling only with vLLM: https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501#function-calling
The model should work very well for function calling tasks!

Can you give this a try?

Also, we'd be more than happy about any contribution to make function calling work with HF format.

It was working fine before and recent commit added strftime and it is bugging at Text-Generation-Inference now

@patrickvonplaten I was gonna test that today as well. That works without applying the template extension from OP or did you include it?

Also did you try this on openai compatible vllm endpoint or just the offline inference?

On TGI , it works without op template. Now it is broken after they included strftime

On TGI , it works without op template. Now it is broken after they included strftime

The latest TGI commit fixes this.

But regarding the original topic, I'm getting this error when using tool calling: Template error: syntax error: Only user, system and assistant roles are supported! .

On TGI , it works without op template. Now it is broken after they included strftime

The latest TGI commit fixes this.

But regarding the original topic, I'm getting this error when using tool calling: Template error: syntax error: Only user, system and assistant roles are supported! .

yes, that is my concern also.

I'm deploying the model with nvidia triton + vllm backend, so I can't use the LLM.chat() endpoint of vLLM.

the vLLM backend of triton uses https://docs.vllm.ai/en/v0.6.5/dev/engine/async_llm_engine.html , that takes the text directly before passing it to the tokenizer.

I'm obliged to template the messages first (either applying the jinja template myself or using transformers apply_chat_template() https://huggingface.co/docs/transformers/v4.37.1/chat_templating method, that uses the chat template here: https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501/blob/20b2ed1c4e9af44b9ad125f79f713301e27737e2/tokenizer_config.json#L9010 ).

However, the chat template provided for this new model doesn't support tools (neither the response in the history of messages, nor the available tools to use).

Can confirm that with transformers==4.48.3 the HF chat template does not produce the AVAILABLE_TOOL_CALLS data.
For those which want just to play around with the function calling with that model can use as workaround the model ID mistralai/Mistral-7B-Instruct-v0.3 with the AutoTokenizer apply_chat_template to parse your tool functions and produce the needed AVAILABLE_TOOL_CALLS prompt.

I was finally able to get OpenAI compatible tool calling for single and multiple/parallel tool calls with a custom template and parser on vLLM OpenAI server on the mistral base model.

will probably do a PR for the parser to vLLM if you think its a valuable contribution.

image.png
test.py

def get_weather(location: str, unit: str):
    return f"Weather {location} in {unit} is bad!"
def get_gold_price(currency: str = "USD"):
    return f"Getting the gold price in {currency} is enourmous!"
tool_functions = {"get_weather": get_weather, "get_gold_price": get_gold_price}


tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }
},
{
    "type": "function",
    "function": {
        "name": "get_gold_price",
        "description": "Get the current gold price in wanted currency (default to USD).",
        "parameters": {
            "type": "object",
            "properties": {
                "currency": {"type": "string", "description": "Currency code e.g. USD or EUR."}
            }
        }
    }
}]

response = client.chat.completions.create(
    model="uncensoredai/Mistral-Small-24B-Instruct-2501",
    messages=[{"role": "user", "content": "What's the weather like in San Francisco? And whats the current gold price?"}],
    temperature=0,
    extra_body={
        "skip_special_tokens": False
    },
    tools=tools,
    tool_choice="auto"
)

print(f"Function called: {response.choices[0]}")
tool_calls = response.choices[0].message.tool_calls

for index, tool_call in enumerate(tool_calls):
    call_response = tool_call.function
    print(f"{index}. Function called: {call_response.name}")
    print(f"Arguments: {call_response.arguments}")
    if index == 0:
        print(f"Result: {get_weather(**json.loads(call_response.arguments))}")
    elif index == 1:
        print(f"Result: {get_gold_price(**json.loads(call_response.arguments))}")

Sign up or log in to comment