Skip to content

Instantly share code, notes, and snippets.

@dedan
Created September 17, 2011 07:45
Show Gist options
  • Save dedan/1223732 to your computer and use it in GitHub Desktop.
Save dedan/1223732 to your computer and use it in GitHub Desktop.
get senna tagger output into python
My PRP$ B-NP O - B-A0 O
brother NN E-NP O - E-A0 O
has VBZ S-VP O has S-V O
a DT B-NP O - B-A1 B-A0
dog NN E-NP O - I-A1 E-A0
that WDT S-NP O - I-A1 S-R-A0
has VBZ S-VP O has I-A1 S-V
a DT B-NP O - I-A1 B-A1
cat NN E-NP O - E-A1 E-A1
My PRP$ B-NP O - B-A0 O (S1(S(NP*
brother NN E-NP O - E-A0 O *)
has VBZ S-VP O has S-V O (VP*
a DT B-NP O - B-A1 B-A0 (NP(NP*
dog NN E-NP O - I-A1 E-A0 *)
that WDT S-NP O - I-A1 S-R-A0 (SBAR(WHNP*)
has VBZ S-VP O has I-A1 S-V (S(VP*
a DT B-NP O - I-A1 B-A1 (NP*
cat NN E-NP O - E-A1 E-A1 *))))))))
def tag(sentence, senna_path):
"""
tag sentences using the SENNA algorithm of ronan collobert
http://ronan.collobert.com/
"""
p = sp.Popen(['blabla', '-path', senna_path],
executable=os.path.join(senna_path, 'senna'),
stdin=sp.PIPE,
stdout=sp.PIPE)
tagged = p.communicate(sentence)[0]
words = []
for line in tagged.split('\n'):
if not line == '':
tmp = line.split()
words.append({'term': tmp[0],
'pos': tmp[1],
'chk': tmp[2]})
if not tmp[3] == "O":
words[-1]['ner'] = tmp[3]
if len(tmp) > 5:
if not tmp[4] == "-":
words[-1]['base'] = tmp[4]
words[-1]['srl'] = tmp[5:]
return words
@dedan
Copy link
Author

dedan commented Sep 17, 2011

ah, by the way: the tag code is very stupid and slow. It always loads the whole thing to memory (and the senna is quite big) I did not manage to keep it in memory and just ask it for output whenever I want. Will do this when I have more time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment