Can be applied to Aria-UI too?
#1
by
rtbonet
- opened
Does it require that special fork too?
I am looking for a way to quantize this one: https://huggingface.co/Aria-UI/Aria-UI-base
Any hints would be appreciated. Thanks!
I think not as this method relies on the sequential MLP fork, but you could try HQQ quantization: https://github.com/mobiusml/hqq/blob/master/examples/hf/aria_multimodal.py
They are using the original version with grouped GEMM and the result should be <16 GB as well :)