Skip to content

Instantly share code, notes, and snippets.

@slaren

slaren/tk.py Secret

Created March 15, 2023 19:45
Show Gist options
  • Save slaren/9f26fc4cb24685d42601b1d91d70a13a to your computer and use it in GitHub Desktop.
Save slaren/9f26fc4cb24685d42601b1d91d70a13a to your computer and use it in GitHub Desktop.
Llama tokenizer test
import sys
from llama import Tokenizer
tokenizer_path = '../llama.cpp/models/tokenizer.model'
tokenizer = Tokenizer(model_path=tokenizer_path)
text = sys.argv[1]
tokens = tokenizer.encode(text, bos=True, eos=False)
print(tokens)
for tid in tokens:
tk = tokenizer.decode([tid])
print(f"{tid:>6} -> '{tk}'")
print(tokenizer.decode(tokens))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment