![](https://cdn-avatars.huggingface.co/v1/production/uploads/63de560a15266dd945f209ca/PeZf3IF-x7Qh8OcnKH12R.png)
MrDragonFox/mistral_small-grpo-600-step-adaptor
Updated
•
5
its all just attention patching .. really old stuff -https://nnsight.net/
https://github.com/ndif-team/nnsight
is the best toolkit for that
nnsight has a good toolkit for that
just limit vllm to 1 gpu and run the rest on a other one .. or use -gmu
8b repo empty and dataset empty too .. well its a little off from sota .... tbh glm4voice had better results - but its certainly a "ok" poc
gh repo empty / no paper
is that with ddr4 or 5 ?
with 250g ram used ^^ probably running it at a 2 bit quant .