Skip to content

Instantly share code, notes, and snippets.

@organisciak
Last active November 12, 2016 05:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save organisciak/163e59ea6cf71c3cd12de410d075567c to your computer and use it in GitHub Desktop.
Save organisciak/163e59ea6cf71c3cd12de410d075567c to your computer and use it in GitHub Desktop.
Solution: Select top nouns
tl = vol.tokenlist(pages=False)
just_nouns = tl.loc[(slice(None), slice(None), ["NN", "NNS"]),]
top_nouns = just_nouns.sort_values('count', ascending=False)
top_nouns.head(5)
# OUTPUT:
# count
# section token pos
# body doctor NN 83
# time NN 80
# day NN 73
# eyes NNS 61
# way NN 57
# NOTE
# Because each step returns a DataFrame, it is possible to `chain` methods.
# Though inadvisable dense in this case, the solution above is possible to write like this:
vol.tokenlist(pages=False).loc[(slice(None), slice(None), ["NN", "NNS"]),].sort_values('count', ascending=False).head(5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment