God bless you!
#1
by
h2m
- opened
God bless you!
If you can, use the 3.0bpw TinyLLaMA 32K exl2 quant for your speculative decoding draft model and get insane inference speed:
https://huggingface.co/models?sort=trending&search=LoneStriker+tinyllama+32k
Ooba doesn't support speculative deconding unfortunately, but exui and TabbyAPI do.