How much vram do you need?

#12

by hyun10 - opened 4 days ago

hyun10

4 days ago

I have RTX3090 * 4, a total of 96gb vram but when I try to run this with vllm I get a cuda out of memory.

Should I use a quantization model?

zezen

4 days ago

BF16 memory：70B × 2B = 140 GB
Q8 memory：70B × 1B =70 GB
you should use Q8 model

1 day ago

The official version is F8 instead of Q8.

about 9 hours ago

Is there any AWQ Q8 version of this model?

zezen

about 9 hours ago

Is there any AWQ Q8 version of this model?

about 9 hours ago

•

zezen

about 9 hours ago

the AWQ of this Q8 is this? https://huggingface.co/Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ/

This model is Q4. Q8 model file about 70Gb+

about 8 hours ago

I am looking for Q8 model quantized using AWQ, do you know any?

zezen

about 8 hours ago

I am looking for Q8 model quantized using AWQ, do you know any?

You can perform AWQ yourself.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment