Skip to content

Instantly share code, notes, and snippets.

@aniruddha27
Last active September 3, 2020 13:30
Show Gist options
  • Save aniruddha27/1447f1881ac01cb9d7407eea5d050a95 to your computer and use it in GitHub Desktop.
Save aniruddha27/1447f1881ac01cb9d7407eea5d050a95 to your computer and use it in GitHub Desktop.
# function to find sentences containing PMs of India
def find_names(text):
names = []
# spacy doc
doc = nlp(text)
# pattern
pattern = [{'LOWER':'prime'},
{'LOWER':'minister'},
{'POS':'ADP','OP':'?'},
{'POS':'PROPN'}]
# Matcher class object
matcher = Matcher(nlp.vocab)
matcher.add("names", None, pattern)
matches = matcher(doc)
# finding patterns in the text
for i in range(0,len(matches)):
# match: id, start, end
token = doc[matches[i][1]:matches[i][2]]
# append token to list
names.append(str(token))
# Only keep sentences containing Indian PMs
for name in names:
if (name.split()[2] == 'of') and (name.split()[3] != "India"):
names.remove(name)
return names
# apply function
df2['PM_Names'] = df2['Sent'].apply(find_names)
@getcontrol
Copy link

Getting a KeyError: 'Speech_clean' error when I run this, any feedback appreciated.

@aniruddha27
Copy link
Author

Hi, it should have been df2['Sent'] in the last line. I have made the changes, it should work now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment