Skip to content

Instantly share code, notes, and snippets.

View tmylk's full-sized avatar

Lev Konstantinovskiy tmylk

  • Berlin, Germany
  • 07:44 (UTC +02:00)
View GitHub Profile
@tmylk
tmylk / comparison.py
Last active September 25, 2022 15:12
LDA in gensim and sklearn test scripts to compare
from time import time
import logging
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from gensim.matutils import Sparse2Corpus
from gensim.models.ldamodel import LdaModel
@tmylk
tmylk / ldamodel.py
Last active May 2, 2016 22:08
heavily logged versions of LDA in sklearn and gensim to enable comparison
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (C) 2011 Radim Rehurek <radimrehurek@seznam.cz>
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html
#
# Parts of the LDA inference code come from Dr. Hoffman's `onlineldavb.py` script,
# (C) 2010 Matthew D. Hoffman, GNU GPL 3.0
@tmylk
tmylk / get_similiarity_you_need.py
Last active February 8, 2017 14:41
code for "how to get the similarity you need" from https://speakerdeck.com/tmylk/wordrank-pydata-5-min-talk
# just run this in the end of 01_pride_and_predjudice.ipynb from https://github.com/cytora/pycon-nlp-in-10-lines
processed_sentences = [sent.lemma_.split() for sent in processed_text.sents]
interchangeable_words_model = Word2Vec(
sentences=processed_sentences,
workers=multiprocessing.cpu_count() - 1, # use your cores
window=2, sg=1)
attributes_of_model = Word2Vec(
sentences=processed_sentences,
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
We can make this file beautiful and searchable if this error is corrected: No tabs found in this TSV file in line 0.
пушистый
и
пудель
котенок
громкий
мяукал
лаял
большой
бегал
мурлыкал