Skip to content

Instantly share code, notes, and snippets.

@zredlined
Created January 29, 2020 21:48
Show Gist options
  • Save zredlined/b59d890dc9ca891881b7c8c2e3a6bdae to your computer and use it in GitHub Desktop.
Save zredlined/b59d890dc9ca891881b7c8c2e3a6bdae to your computer and use it in GitHub Desktop.
# Count the unique characters in the file
vocab = sorted(set(text))
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])
print (f'Training set contains {len(vocab)} unique characters')
print (f'First 20 characters -> index mappings')
df = pd.DataFrame(zip(text, text_as_int), columns=['char','char2idx'])
df[:20]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment