Skip to content

Instantly share code, notes, and snippets.

@jacquesfize
jacquesfize / remove_token.py
Last active June 11, 2022 14:59
A function to delete tokens from a spacy Doc object without losing associated information (PartOfSpeech, Dependance, Lemma, ...)
def remove_tokens(doc, index_to_del, list_attr=[LOWER, POS, ENT_TYPE, IS_ALPHA, DEP, LEMMA, LOWER, IS_PUNCT, IS_DIGIT, IS_SPACE, IS_STOP]):
"""
Remove tokens from a Spacy *Doc* object without losing
associated information (PartOfSpeech, Dependance, Lemma, extensions, ...)
Parameters
----------
doc : spacy.tokens.doc.Doc
spacy representation of the text
index_to_del : list of integer