Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,9 @@ This is a int4_awq quantized checkpoint of bigcode/starcoder2-15b. It takes abou
|
|
15 |
## Running this Model
|
16 |
vLLM does not natively support autoawq currently (or any a4w8 as of writing this), so one can just serve directly from the autoawq backend.
|
17 |
|
|
|
|
|
|
|
18 |
`pip install fastapi[all] torch transformers autoawq`
|
19 |
|
20 |
Then in python3:
|
|
|
15 |
## Running this Model
|
16 |
vLLM does not natively support autoawq currently (or any a4w8 as of writing this), so one can just serve directly from the autoawq backend.
|
17 |
|
18 |
+
Note, if you want to start this in a container, then:
|
19 |
+
`docker run --gpus all -it --name=starcoder2-15b-int4-awq -p 8000:8000 -v ~/.cache:/root/.cache nvcr.io/nvidia/pytorch:24.12-py3 bash`
|
20 |
+
|
21 |
`pip install fastapi[all] torch transformers autoawq`
|
22 |
|
23 |
Then in python3:
|