Skip to content

Instantly share code, notes, and snippets.

View bartgras's full-sized avatar

bartgras

View GitHub Profile
#First I've tried this, but it failed with OOM
mkdir -p ./data/v1_bigtest
mkdir -p ./train/v1_bigtest
t2t-trainer \
--t2t_usr_dir=./data_generators \
--generate_data \
--tmp_dir=/e/challenger_nmt/t2t_temp_dir \
--data_dir=./data/v1_bigtest \
--problems=challenger_enzh_v1 \
@bartgras
bartgras / gist:68ad3bf2c221db5794bd833e903570b0
Created September 19, 2017 13:51
Sample code from paper: "Neural Machine Translation of Rare Words with Subword Units" to visualuze how subwords are being generated
import re, collections
def get_stats(vocab):
pairs = collections.defaultdict(int)
for word, freq in vocab.items():
symbols = word.split()
for i in range(len(symbols)-1):
pairs[symbols[i],symbols[i+1]] += freq
print(pairs)
return pairs