shavera
/

starcoder2-15b-w4-autoawq-gemm

4-bit precision

Model card Files Files and versions Community

shavera commited on 26 days ago

Commit

3d05d7b

·

verified ·

1 Parent(s): e6fc2f8

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -15,6 +15,9 @@ This is a int4_awq quantized checkpoint of bigcode/starcoder2-15b. It takes abou
 ## Running this Model
 vLLM does not natively support autoawq currently (or any a4w8 as of writing this), so one can just serve directly from the autoawq backend.
 `pip install fastapi[all] torch transformers autoawq`
 Then in python3:

 ## Running this Model
 vLLM does not natively support autoawq currently (or any a4w8 as of writing this), so one can just serve directly from the autoawq backend.
+Note, if you want to start this in a container, then:
+`docker run --gpus all -it --name=starcoder2-15b-int4-awq -p 8000:8000 -v ~/.cache:/root/.cache nvcr.io/nvidia/pytorch:24.12-py3 bash`
 `pip install fastapi[all] torch transformers autoawq`
 Then in python3: