Skip to content

Instantly share code, notes, and snippets.

@y-lan
Last active July 20, 2023 14:16
Show Gist options
  • Save y-lan/1d7574ba89f158fc9df36f25117671e8 to your computer and use it in GitHub Desktop.
Save y-lan/1d7574ba89f158fc9df36f25117671e8 to your computer and use it in GitHub Desktop.
Count Llama Tokens
from transformers import LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained('decapoda-research/llama-7b-hf')
def count(text):
return len(tokenizer(text)['input_ids'])
def parallel_count(texts):
from joblib import Parallel, delayed
results = Parallel(n_jobs=-1)(delayed(count)(text) for text in texts)
return sum(results)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment