Birch-san/code-assist.md

## code-assist.md

      
    Raw
  

              code-assist.md
            
          
    Install HF Code Autocomplete VSCode plugin.
We are not going to set an API token. We are going to specify an API endpoint.

We will try to deploy that API ourselves, to use our own GPU to provide the code assistance.
We will use bigcode/starcoder, a 15.5B param model.

We will use NF4 4-bit quantization to fit this into 10787MiB VRAM.

It would require 23767MiB VRAM unquantized. (still fits on a 4090, which has 24564MiB)!
Setup API

All instructions are written assuming your command-line shell is bash.
Clone huggingface-vscode-endpoint-server repository:
git clone https://github.com/Birch-san/huggingface-vscode-endpoint-server.git
cd huggingface-vscode-endpoint-server
Create + activate a new virtual environment

This is to avoid interfering with your current Python environment (other Python scripts on your computer might not appreciate it if you update a bunch of packages they were relying on).
Follow the instructions for virtualenv, or conda, or neither (if you don't care what happens to other Python scripts on your computer).
Using venv

Create environment:
python -m venv venv
pip install --upgrade pip
Activate environment:
. ./venv/bin/activate
(First-time) update environment's pip:
pip install --upgrade pip
Using conda

Download conda.
Skip this step if you already have conda.
Install conda:
Skip this step if you already have conda.
Assuming you're using a bash shell:
# Linux installs Anaconda via this shell script. Mac installs by running a .pkg installer.
bash Anaconda-latest-Linux-x86_64.sh
# this step probably works on both Linux and Mac.
eval "$(~/anaconda3/bin/conda shell.bash hook)"
conda config --set auto_activate_base false
conda init
Create environment:
conda create -n p311-code-api python=3.11
Activate environment:
conda activate p311-code-api
Install package dependencies

Ensure you have activated the environment you created above.
(Optional) treat yourself to latest nightly of PyTorch, with support for Python 3.11 and CUDA 12.1:
# CUDA
pip install --upgrade --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cu121
Install dependencies:
pip install -r requirements.txt
Run API:

From root of huggingface-vscode-endpoint-server repository:
python -m main --model_name_or_path bigcode/starcoder --bf16
Error: bigcode/starcoder repository not found / "private repository"
If you get this error:

You'll need to accept the terms on the bigcode/starcoder model card.
If you haven't logged into the huggingface CLI before: you'll also need to do that, so that it can authenticate as you, to check whether you accepted the model card's terms.
Go to tokens, create a new read-only token.

Copy the new token to your clipboard.
Run huggingface-cli login from your command prompt, and paste the token.
Try running main again.
Test API

Check this first, before we try to get VSCode working.
curl -X POST http://localhost:8000/api/generate/ -d '{"inputs": "", "parameters": {"max_new_tokens": 64}}'
If it works: we're ready to try it in VSCode.
Try out starcoder integration in VSCode

Open the VSCode extension settings for starcoder:

Set your API endpoint as:
http://localhost:8000/api/generate


You may need to Reload Window, to initialize the HF Code Autocomplete, now that you have changed the settings (open command palette with Cmd+Shift+P, and type Reload Window):

Running code inference

Create a new empty text file. Set the language to Python.
Type:
def main():

Whilst it thinks: you should see a spinner in the status bar at the bottom of VSCode:

Starcoder should auto-complete it for you!


Press tab to accept the completion.

Troubleshooting HF Code Autocomplete extension

Open the Output tab of VSCode's tray, pick the "Hugging Face Code" dropdown option:

You should be able to see anything logged by from the VSCode Extension.