Skip to content

Instantly share code, notes, and snippets.

@aidiary
Created March 27, 2020 06:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aidiary/ab54d48752778a81932bdeb3f73f1f3f to your computer and use it in GitHub Desktop.
Save aidiary/ab54d48752778a81932bdeb3f73f1f3f to your computer and use it in GitHub Desktop.
出現回数を棒グラフで綺麗に表示する
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set(style='darkgrid')
sns.set(font_scale=1.5)
plt.rcParams['figure.figsize'] = (10, 5)
token_lengths = [len(token) for token in tokenizer.vocab.keys()]
sns.countplot(token_lengths)
plt.title('Vocab Token Lengths')
plt.xlabel('Token Length')
plt.ylabel('# of Tokens')
print('Maximum token length:', max(token_lengths))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment