Skip to content

Instantly share code, notes, and snippets.

@minhlab
Created December 9, 2016 13:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save minhlab/75811994c75549465cd3fc0ba7d20f13 to your computer and use it in GitHub Desktop.
Save minhlab/75811994c75549465cd3fc0ba7d20f13 to your computer and use it in GitHub Desktop.
Perform some statistics on ECB given ECB+ directory which include the ECB files (Cybulska and Vossen, 2014)
import os
import re
count = 0
for root, dir_names, file_names in os.walk('ECB+'):
for fname in file_names:
if 'plus' not in fname:
path = os.path.join(root, fname)
with open(path) as f:
content = f.read()
print list(m.group() for m in re.finditer('<token', content))
count += sum(1 for _ in re.finditer('<token', content))
# print path
print count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment