Skip to content

Instantly share code, notes, and snippets.

@minhlab
Created December 9, 2016 13:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save minhlab/0379f030176e85f747c1d2f6e1232932 to your computer and use it in GitHub Desktop.
Save minhlab/0379f030176e85f747c1d2f6e1232932 to your computer and use it in GitHub Desktop.
Print some statistics of ECB+ (Cybulska and Vossen, 2014)
import os
import re
count = 0
for root, dir_names, file_names in os.walk('ECB+'):
for fname in file_names:
if 'plus' in fname:
path = os.path.join(root, fname)
with open(path) as f:
content = f.read()
print list(m.group() for m in re.finditer('<token', content))
count += sum(1 for _ in re.finditer('<token', content))
# print path
print count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment