Last active
November 12, 2016 05:43
-
-
Save organisciak/163e59ea6cf71c3cd12de410d075567c to your computer and use it in GitHub Desktop.
Solution: Select top nouns
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tl = vol.tokenlist(pages=False) | |
just_nouns = tl.loc[(slice(None), slice(None), ["NN", "NNS"]),] | |
top_nouns = just_nouns.sort_values('count', ascending=False) | |
top_nouns.head(5) | |
# OUTPUT: | |
# count | |
# section token pos | |
# body doctor NN 83 | |
# time NN 80 | |
# day NN 73 | |
# eyes NNS 61 | |
# way NN 57 | |
# NOTE | |
# Because each step returns a DataFrame, it is possible to `chain` methods. | |
# Though inadvisable dense in this case, the solution above is possible to write like this: | |
vol.tokenlist(pages=False).loc[(slice(None), slice(None), ["NN", "NNS"]),].sort_values('count', ascending=False).head(5) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment