Skip to content

Instantly share code, notes, and snippets.

View qgbcs's full-sized avatar

github.com/users/qgbcs qgbcs

View GitHub Profile
@ctlllll
ctlllll / longest_chinese_tokens_gpt4o.py
Created May 13, 2024 19:53
Longest Chinese tokens in gpt4o
import tiktoken
import langdetect
T = tiktoken.get_encoding("o200k_base")
length_dict = {}
for i in range(T.n_vocab):
try:
length_dict[i] = len(T.decode([i]))
except: