Why does the chat prompt template adding dates to the messages?
#179
by
Krooz
- opened
I tried to do
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
chat = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]
print(tokenizer.apply_chat_template(chat,tokenize=False))
Output:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
<|eot_id|><|start_header_id|>user<|end_header_id|>
Hello, how are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
I'm doing great. How can I help you today?<|eot_id|><|start_header_id|>user<|end_header_id|>
I'd like to show off how chat templating works!<|eot_id|>
Why am I getting dates during the tokenization process? This may impact the performance during fine-tuning the model for custom task. Also the date 26 Jul is not updated too.