Issue with long prompts (>2048 tokens) and truncation when using Phi 1.5

#53
by nulltella - opened

Hello,

I'm relatively new to this and I've been working with the Phi 1.5 model. I've uploaded model and tokenizer, but I'm encountering a problem when dealing with very long prompts (longer than 2048 tokens). Despite setting truncation to True, the output I receive only reflects the first 2048 tokens or so of the prompt. Here's the relevant portion of my code:

Here's the code snippet:

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", trust_remote_code=True, torch_dtype=torch.float32)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
prompt= ** Long prompt **
inputs = tokenizer([prompt], return_tensors="pt", truncation=True).to('cuda')
streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=streamer, pad_token_id=tokenizer.eos_token_id)

I was under the impression that the truncation=True parameter would handle long prompts by truncating them to fit within the model's maximum sequence length. However the output I'm seeing seems to just be the beginning portion of the prompt up to around 2048 tokens.

Is there smthng obvious I'm missing or misunderstanding about how truncation works? Or is there another approach I should be taking to handle long prompts?

Thank you !

Microsoft org

This should be fixed on the latest commit, sorry for taking too long!

gugarosa changed discussion status to closed

Sign up or log in to comment