Skip to content

Instantly share code, notes, and snippets.

Last active Mar 26, 2019
What would you like to do?
extract text from simpsons_script_lines.csv
data_dir = './data/simpsons_script_lines.csv'
input_file = os.path.join(data_dir)
clean_text = ''
with open(input_file, "r", encoding="utf8") as f:
for line in f:
text ='[0-9]*,[0-9]*,[0-9]*,(.+?),[0-9]*,', line)
if text:
text ='"', '')
text_parts = text.split(':')
text_parts[0] = text_parts[0].replace(' ', '_')
text = ':'.join(text_parts)
clean_text += text + '\n'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment