Skip to content

Instantly share code, notes, and snippets.

View ctlllll's full-sized avatar
🚀
Building

Tianle Cai ctlllll

🚀
Building
View GitHub Profile
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@Birch-san
Birch-san / llama_flash.py
Last active January 22, 2024 06:05
Loading llama with Flash Attention
from transformers import (
AutoConfig,
AutoTokenizer,
BitsAndBytesConfig,
GenerationConfig,
AutoModelForCausalLM,
LlamaTokenizerFast,
PreTrainedModel,
TextIteratorStreamer,
StoppingCriteria,
@Birch-san
Birch-san / local-copilot.md
Last active June 26, 2024 06:58
Running GitHub Copilot against local Code Llama model

Running GitHub Copilot VSCode extension against local Code Llama model

image

image

Tested on NVIDIA RTX 4090, but these instructions also cover AMD and Mac in case you wanna try those.
This guide assumes you are running Linux (I ran this on Ubuntu).

Before you get excited:

@ctlllll
ctlllll / longest_chinese_tokens_gpt4o.py
Created May 13, 2024 19:53
Longest Chinese tokens in gpt4o
import tiktoken
import langdetect
T = tiktoken.get_encoding("o200k_base")
length_dict = {}
for i in range(T.n_vocab):
try:
length_dict[i] = len(T.decode([i]))
except: