Created
January 29, 2020 21:48
-
-
Save zredlined/b59d890dc9ca891881b7c8c2e3a6bdae to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Count the unique characters in the file | |
vocab = sorted(set(text)) | |
# Creating a mapping from unique characters to indices | |
char2idx = {u:i for i, u in enumerate(vocab)} | |
idx2char = np.array(vocab) | |
text_as_int = np.array([char2idx[c] for c in text]) | |
print (f'Training set contains {len(vocab)} unique characters') | |
print (f'First 20 characters -> index mappings') | |
df = pd.DataFrame(zip(text, text_as_int), columns=['char','char2idx']) | |
df[:20] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment