Skip to content

Instantly share code, notes, and snippets.

View UEhQZXI's full-sized avatar
🎯
Focusing

Echo UEhQZXI

🎯
Focusing
  • United States
View GitHub Profile
@UEhQZXI
UEhQZXI / longest_chinese_tokens_gpt4o.py
Created May 14, 2024 08:10 — forked from ctlllll/longest_chinese_tokens_gpt4o.py
Longest Chinese tokens in gpt4o
import tiktoken
import langdetect
T = tiktoken.get_encoding("o200k_base")
length_dict = {}
for i in range(T.n_vocab):
try:
length_dict[i] = len(T.decode([i]))
except: