shavera commited on
Commit
3d05d7b
·
verified ·
1 Parent(s): e6fc2f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -15,6 +15,9 @@ This is a int4_awq quantized checkpoint of bigcode/starcoder2-15b. It takes abou
15
  ## Running this Model
16
  vLLM does not natively support autoawq currently (or any a4w8 as of writing this), so one can just serve directly from the autoawq backend.
17
 
 
 
 
18
  `pip install fastapi[all] torch transformers autoawq`
19
 
20
  Then in python3:
 
15
  ## Running this Model
16
  vLLM does not natively support autoawq currently (or any a4w8 as of writing this), so one can just serve directly from the autoawq backend.
17
 
18
+ Note, if you want to start this in a container, then:
19
+ `docker run --gpus all -it --name=starcoder2-15b-int4-awq -p 8000:8000 -v ~/.cache:/root/.cache nvcr.io/nvidia/pytorch:24.12-py3 bash`
20
+
21
  `pip install fastapi[all] torch transformers autoawq`
22
 
23
  Then in python3: