Skip to content

Instantly share code, notes, and snippets.

@fuCtor
Created June 13, 2019 14:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fuCtor/623374428d70859e1d87e007195f2718 to your computer and use it in GitHub Desktop.
Save fuCtor/623374428d70859e1d87e007195f2718 to your computer and use it in GitHub Desktop.
file_names = os.listdir(PATH)
texts = []
for name in file_names:
if name.endswith(".txt"):
with codecs.open(PATH + "/" + name, encoding = 'utf-8') as f:
print(name)
text = f.read()
lst = re.findall(r'\w+', text)
words = []
for word in lst:
lemma = morph.parse(word)[0] # делаем разбор
words.append(lemma.normal_form)
texts.append(words)
# each document should contain lemmatized words separated by spaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment