Skip to content

Instantly share code, notes, and snippets.

@prrao87
Last active August 26, 2019 15:58
Show Gist options
  • Save prrao87/075a1160922ac21c47338bed59f70329 to your computer and use it in GitHub Desktop.
Save prrao87/075a1160922ac21c47338bed59f70329 to your computer and use it in GitHub Desktop.
Convert SST-5 tree data to tabular form
# Load data
import pytreebank
import sys
import os
out_path = os.path.join(sys.path[0], 'sst_{}.txt')
dataset = pytreebank.load_sst('./raw_data')
# Store train, dev and test in separate files
for category in ['train', 'test', 'dev']:
with open(out_path.format(category), 'w') as outfile:
for item in dataset[category]:
outfile.write("__label__{}\t{}\n".format(
item.to_labeled_lines()[0][0] + 1,
item.to_labeled_lines()[0][1]
))
# Print the length of the training set
print(len(dataset['train']))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment