Skip to content

Instantly share code, notes, and snippets.

@evanmiltenburg
Created April 5, 2020 08:28
Show Gist options
  • Save evanmiltenburg/018eaa0a2432230d4340842786b64b1c to your computer and use it in GitHub Desktop.
Save evanmiltenburg/018eaa0a2432230d4340842786b64b1c to your computer and use it in GitHub Desktop.
Script om personen te vinden in Nederlandse tekst
import spacy
nlp = spacy.load('nl_core_news_sm')
with open('bordewijk.txt') as f:
doc = nlp(f.read())
people = [ent.orth_ for ent in doc.ents if ent.label_ == 'PERSON']
print(people)
@evanmiltenburg
Copy link
Author

Met deze tekst geeft dit de volgende resultaten:

['Regulus', 'Alcor', 'Alcor', '[', 'velgloos wiel', 'Gods duim', '[', 'Eufemia van Tinborn', 'Alcor', '[', 'Sofia Eufemia', 'Sofia Eufemia', 'Eufemia', '[', 'Eufemia', 'kim', 'kim', '[', 'lomp monsterachtig', '[']

"kim" is ook een false positive; dat woord komt uit de uitdrukking "van kim tot kim".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment