Can be applied to Aria-UI too?

by rtbonet - opened 19 days ago

19 days ago

Does it require that special fork too?
I am looking for a way to quantize this one: https://huggingface.co/Aria-UI/Aria-UI-base
Any hints would be appreciated. Thanks!

Owner 19 days ago

I think not as this method relies on the sequential MLP fork, but you could try HQQ quantization: https://github.com/mobiusml/hqq/blob/master/examples/hf/aria_multimodal.py
They are using the original version with grouped GEMM and the result should be <16 GB as well :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment