Skip to content

Instantly share code, notes, and snippets.

@heffo42
Created July 19, 2019 21:24
Show Gist options
  • Save heffo42/58123f7b79046a221bf5521e981952a8 to your computer and use it in GitHub Desktop.
Save heffo42/58123f7b79046a221bf5521e981952a8 to your computer and use it in GitHub Desktop.
train_path = os.path.join('./train.txt')
valid_path = os.path.join('./valid.txt')
train_txt = ' , '.join(o.strip() for o in open(train_path).readlines())
valid_txt = ' , '.join(o.strip() for o in open(valid_path).readlines())
train_list = train_txt.split(' ')
valid_list = valid_txt.split(' ')
train_set = set(train_txt.split(' '))
valid_set = set(valid_txt.split(' '))
vocab = train_set.union(valid_set)
# Creating a dictionary that maps integers to the strings
int2string = dict(enumerate(vocab))
# Creating another dictionary that maps strings to integers
string2int = {string: ind for ind, string in int2string.items()}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment