Tokenizer EOS Token
#21
by
saksham-lamini
- opened
For instruct, we have an eot_id, and eos_id. Via the tokenizer interface, only the tokenizer.eos_token_id exposes eos_id. There doesn't seem to be a way to expose the eot_id token, which would be important for stopping criterias, etc.
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(xxx, eos_token_id=terminators)
pcuenq
changed discussion status to
closed