Skip to content

Instantly share code, notes, and snippets.

@harukaeru
Created November 4, 2018 11:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save harukaeru/4d799d14e6c1a9c6c21683dcc8f99bb2 to your computer and use it in GitHub Desktop.
Save harukaeru/4d799d14e6c1a9c6c21683dcc8f99bb2 to your computer and use it in GitHub Desktop.
Assemble data for Anki from normalized TSV file specialized in Weblio.
# python toAnki.py /path/to/your_tsv_file.tsv
import re
import sys
alphabet = r'[A-Za-z ,\.\[\]"\']'
alphabet = re.compile(alphabet)
quotation = r'- .*$'
quotation = re.compile(quotation)
lines = open(sys.argv[1]).readlines()
out = open(sys.argv[1] + '.converted.tsv', 'w')
for line in lines:
array_line = line.replace('\n', '').split('\t')
word = array_line[0]
pronunciation = array_line[1]
meaning = array_line[2].replace(' ', '&nbsp').replace('&nbsp', ' ')
sentence = array_line[3]
en = ''
ja = ''
for i, c in enumerate(sentence):
if not alphabet.match(c):
en = sentence[:i]
ja = sentence[i:]
break
ja = quotation.sub('', ja)
front = word + '<br />' + pronunciation
back = (
'<span style="font-size: 12px">' + meaning +
'</span><br /><br />' +
'<span style="font-size: 12px">' + en + '<br />' + ja + '</span>'
)
out.write(front + '\t' + back + '\n')
@harukaeru
Copy link
Author

You should see the code in https://gist.github.com/harukaeru/6048361a11473b9678138e636e24de96 before you execute this code because it depends on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment