Skip to content

Instantly share code, notes, and snippets.

View tmylk's full-sized avatar

Lev Konstantinovskiy tmylk

  • Berlin, Germany
  • 17:35 (UTC +02:00)
View GitHub Profile
We can make this file beautiful and searchable if this error is corrected: No tabs found in this TSV file in line 0.
пушистый
и
пудель
котенок
громкий
мяукал
лаял
большой
бегал
мурлыкал
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@tmylk
tmylk / get_similiarity_you_need.py
Last active February 8, 2017 14:41
code for "how to get the similarity you need" from https://speakerdeck.com/tmylk/wordrank-pydata-5-min-talk
# just run this in the end of 01_pride_and_predjudice.ipynb from https://github.com/cytora/pycon-nlp-in-10-lines
processed_sentences = [sent.lemma_.split() for sent in processed_text.sents]
interchangeable_words_model = Word2Vec(
sentences=processed_sentences,
workers=multiprocessing.cpu_count() - 1, # use your cores
window=2, sg=1)
attributes_of_model = Word2Vec(
sentences=processed_sentences,
@tmylk
tmylk / ldamodel.py
Last active May 2, 2016 22:08
heavily logged versions of LDA in sklearn and gensim to enable comparison
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (C) 2011 Radim Rehurek <radimrehurek@seznam.cz>
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html
#
# Parts of the LDA inference code come from Dr. Hoffman's `onlineldavb.py` script,
# (C) 2010 Matthew D. Hoffman, GNU GPL 3.0
@tmylk
tmylk / comparison.py
Last active September 25, 2022 15:12
LDA in gensim and sklearn test scripts to compare
from time import time
import logging
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from gensim.matutils import Sparse2Corpus
from gensim.models.ldamodel import LdaModel