Skip to content

Instantly share code, notes, and snippets.

@davidlenz
Last active May 25, 2018 09:57
Show Gist options
  • Save davidlenz/f94d26d21ad88ff5aa78f422f058650a to your computer and use it in GitHub Desktop.
Save davidlenz/f94d26d21ad88ff5aa78f422f058650a to your computer and use it in GitHub Desktop.
Usage of Spacy lemmatizer. Convert list of strings to lemmatized version.
import spacy
settings.LEMMATIZER_BATCH_SIZE = 250
settings.LEMMATIZER_N_THREADS = -1
nlp = spacy.load('de')
nlp.disable_pipes('tagger', 'ner')
def spacy_lemmatizer(text, nlp):
"""text is a list of string. nlp is a spacy nlp object. Use nlp.disable_pipes('tagger','ner') to speed up lemmatization"""
doclist = list(nlp.pipe(text, n_threads=settings.LEMMATIZER_N_THREADS, batch_size=settings.LEMMATIZER_BATCH_SIZE))
docs=[]
for i, doc in enumerate(doclist):
docs.append(' '.join([listitem.lemma_ for listitem in doc]))
return docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment