Can be applied to Aria-UI too?

#1
by rtbonet - opened

Does it require that special fork too?
I am looking for a way to quantize this one: https://huggingface.co/Aria-UI/Aria-UI-base
Any hints would be appreciated. Thanks!

I think not as this method relies on the sequential MLP fork, but you could try HQQ quantization: https://github.com/mobiusml/hqq/blob/master/examples/hf/aria_multimodal.py
They are using the original version with grouped GEMM and the result should be <16 GB as well :)

Sign up or log in to comment