Skip to content

Instantly share code, notes, and snippets.

@fuCtor
Created Jun 13, 2019
Embed
What would you like to do?
file_names = os.listdir(PATH)
texts = []
for name in file_names:
if name.endswith(".txt"):
with codecs.open(PATH + "/" + name, encoding = 'utf-8') as f:
print(name)
text = f.read()
lst = re.findall(r'\w+', text)
words = []
for word in lst:
lemma = morph.parse(word)[0] # делаем разбор
words.append(lemma.normal_form)
texts.append(words)
# each document should contain lemmatized words separated by spaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment