Anybody got it to run a quantized version with vLLM

#24

by alecauduro - opened 1 day ago

1 day ago

I'm not having luck getting the quantized versions (unsloth or awq) to work with vLLM.

about 22 hours ago

I completed the W8A8 quantization of its abliterated version and used vllm inference, everything worked fine on a dual card 2080ti-22G.

about 7 hours ago

stelterlab/Mistral-Small-24B-Instruct-2501-AWQ worked for me with a 4090

about 3 hours ago

I was able to get it running, was missing the --enforce-eager parameter.
Now I'm trying to figure out why function calling doesn't work.

mistral_common.exceptions.InvalidMessageStructureException: Unexpected role 'system' after role 'tool'

about 3 hours ago

ok, it was just a question of changing the order for system prompt to come first. Exceptional local model!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment