How much vram do you need?

#12
by hyun10 - opened

I have RTX3090 * 4, a total of 96gb vram but when I try to run this with vllm I get a cuda out of memory.

Should I use a quantization model?

BF16 memory:70B × 2B = 140 GB
Q8 memory:70B × 1B =70 GB
you should use Q8 model

The official version is F8 instead of Q8.

Is there any AWQ Q8 version of this model?

Is there any AWQ Q8 version of this model?

https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF

the AWQ of this Q8 is this? https://huggingface.co/Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ/

This model is Q4. Q8 model file about 70Gb+

I am looking for Q8 model quantized using AWQ, do you know any?

I am looking for Q8 model quantized using AWQ, do you know any?

You can perform AWQ yourself.

Sign up or log in to comment