Last active
December 5, 2018 18:59
-
-
Save gavinmh/4735528 to your computer and use it in GitHub Desktop.
Named Entity Extraction with NLTK in Python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
''' | |
''' | |
from nltk import sent_tokenize, word_tokenize, pos_tag, ne_chunk | |
def extract_entities(text): | |
entities = [] | |
for sentence in sent_tokenize(text): | |
chunks = ne_chunk(pos_tag(word_tokenize(sentence))) | |
entities.extend([chunk for chunk in chunks if hasattr(chunk, 'node')]) | |
return entities | |
if __name__ == '__main__': | |
text = """ | |
A multi-agency manhunt is under way across several states and Mexico after | |
police say the former Los Angeles police officer suspected in the murders of a | |
college basketball coach and her fiancé last weekend is following through on | |
his vow to kill police officers after he opened fire Wednesday night on three | |
police officers, killing one. | |
"In this case, we're his target," Sgt. Rudy Lopez from the Corona Police | |
Department said at a press conference. | |
The suspect has been identified as Christopher Jordan Dorner, 33, and he is | |
considered extremely dangerous and armed with multiple weapons, authorities | |
say. The killings appear to be retribution for his 2009 termination from the | |
Los Angeles Police Department for making false statements, authorities say. | |
Dorner posted an online manifesto that warned, "I will bring unconventional | |
and asymmetrical warfare to those in LAPD uniform whether on or off duty." | |
""" | |
print extract_entities(text) |
@reach2ashish Replace 'node' with 'label' on Line 12 and it will work :)
If you're using Python3, you will also have to add additional ( ) around the print statement.
This code doesn't recognize dates????
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
File "C:\ProgramData\Anaconda3\lib\site-packages\nltk\tree.py", line 202, in _get_node
raise NotImplementedError("Use label() to access a node label.")
NotImplementedError: Use label() to access a node label.