Skip to content

Instantly share code, notes, and snippets.

@KentaKudo
Created February 18, 2018 11:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KentaKudo/a6366ef160a6825205201d339454ed19 to your computer and use it in GitHub Desktop.
Save KentaKudo/a6366ef160a6825205201d339454ed19 to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*-
import re
f = open('examples.utf', 'r')
f_j = open('tanaka_corpus_j.txt', 'w')
f_e = open('tanaka_corpus_e.txt', 'w')
cnt = 0
for row in f:
if row.find('B:') != -1: continue
s = row.replace('A: ', '')
s = re.sub('#ID=.*?$', '', s)
j, e = s.split('\t')
j = ' '.join(list(j))
e = e.strip()
print(j, file=f_j)
print(e, file=f_e)
@KentaKudo
Copy link
Author

KentaKudo commented Feb 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment