Skip to content

Instantly share code, notes, and snippets.

@Birch-san
Last active March 4, 2024 19:32
Show Gist options
  • Star 32 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save Birch-san/37c1309b547888c472b62e9a7de2ecde to your computer and use it in GitHub Desktop.
Save Birch-san/37c1309b547888c472b62e9a7de2ecde to your computer and use it in GitHub Desktop.
Local VSCode AI code assistance via starcoder + 4-bit quantization in ~11GB VRAM

Install HF Code Autocomplete VSCode plugin.

We are not going to set an API token. We are going to specify an API endpoint.
We will try to deploy that API ourselves, to use our own GPU to provide the code assistance.

We will use bigcode/starcoder, a 15.5B param model.
We will use NF4 4-bit quantization to fit this into 10787MiB VRAM.
It would require 23767MiB VRAM unquantized. (still fits on a 4090, which has 24564MiB)!

Setup API

All instructions are written assuming your command-line shell is bash.

Clone huggingface-vscode-endpoint-server repository:

git clone https://github.com/Birch-san/huggingface-vscode-endpoint-server.git
cd huggingface-vscode-endpoint-server

Create + activate a new virtual environment

This is to avoid interfering with your current Python environment (other Python scripts on your computer might not appreciate it if you update a bunch of packages they were relying on).

Follow the instructions for virtualenv, or conda, or neither (if you don't care what happens to other Python scripts on your computer).

Using venv

Create environment:

python -m venv venv
pip install --upgrade pip

Activate environment:

. ./venv/bin/activate

(First-time) update environment's pip:

pip install --upgrade pip

Using conda

Download conda.

Skip this step if you already have conda.

Install conda:

Skip this step if you already have conda.

Assuming you're using a bash shell:

# Linux installs Anaconda via this shell script. Mac installs by running a .pkg installer.
bash Anaconda-latest-Linux-x86_64.sh
# this step probably works on both Linux and Mac.
eval "$(~/anaconda3/bin/conda shell.bash hook)"
conda config --set auto_activate_base false
conda init

Create environment:

conda create -n p311-code-api python=3.11

Activate environment:

conda activate p311-code-api

Install package dependencies

Ensure you have activated the environment you created above.

(Optional) treat yourself to latest nightly of PyTorch, with support for Python 3.11 and CUDA 12.1:

# CUDA
pip install --upgrade --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cu121

Install dependencies:

pip install -r requirements.txt

Run API:

From root of huggingface-vscode-endpoint-server repository:

python -m main --model_name_or_path bigcode/starcoder --bf16

Error: bigcode/starcoder repository not found / "private repository"

If you get this error:
You'll need to accept the terms on the bigcode/starcoder model card.

If you haven't logged into the huggingface CLI before: you'll also need to do that, so that it can authenticate as you, to check whether you accepted the model card's terms.

Go to tokens, create a new read-only token.
Copy the new token to your clipboard.

Run huggingface-cli login from your command prompt, and paste the token.

Try running main again.

Test API

Check this first, before we try to get VSCode working.

curl -X POST http://localhost:8000/api/generate/ -d '{"inputs": "", "parameters": {"max_new_tokens": 64}}'

If it works: we're ready to try it in VSCode.

Try out starcoder integration in VSCode

Open the VSCode extension settings for starcoder:

image

Set your API endpoint as:

http://localhost:8000/api/generate

image

You may need to Reload Window, to initialize the HF Code Autocomplete, now that you have changed the settings (open command palette with Cmd+Shift+P, and type Reload Window):

image

Running code inference

Create a new empty text file. Set the language to Python.

Type:

def main():

image

Whilst it thinks: you should see a spinner in the status bar at the bottom of VSCode:

image

Starcoder should auto-complete it for you!

image

Press tab to accept the completion.

image

Troubleshooting HF Code Autocomplete extension

Open the Output tab of VSCode's tray, pick the "Hugging Face Code" dropdown option:

image

You should be able to see anything logged by from the VSCode Extension.

image

@dewert
Copy link

dewert commented Jun 10, 2023

I found that the parameter to run the server with starcoder had changed. It's currently downloading, but

python -m main --model_name_or_path bigcode/starcoder --bf16

seems to be working for me.

@Birch-san
Copy link
Author

thanks @dewert; good spot, I did change that. updated gist.

@kdcyberdude
Copy link

Hi @Birch-san,

I'm encountering an error while testing the API using the following curl command:
curl -X POST http://localhost:8000/api/generate/ -d '{"inputs": "", "parameters": {"max_new_tokens": 64}}'

CUDA version - 12.1
GPU - 4090

INFO: 127.0.0.1:48018 - "POST /api/generate/ HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/fastapi/applications.py", line 276, in call
await super().call(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/kd/anaconda

@kharelpk
Copy link

For those of you using windows, if you have issues with bitsandbytes you can use this
https://github.com/jllllll/bitsandbytes-windows-webui
python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui

@shan100github
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment