Skip to content

Instantly share code, notes, and snippets.

View nokados's full-sized avatar
🦄
Unicorn shepherd

Nikita Furin nokados

🦄
Unicorn shepherd
  • CraftTalk
  • Moscow, Russia
View GitHub Profile
@nokados
nokados / tqdm_pandas.py
Last active February 1, 2019 11:22
Initializing pandas's progress_apply in jupyter notebook
from tqdm import tqdm_notebook
tqdm_notebook().pandas()
@nokados
nokados / clear_punctuation.py
Last active June 15, 2018 06:46
Clear punctuation for Russian texts, except ?
import string
translator = str.maketrans('', '', re.sub(r'[\?-]', '', string.punctuation+'«»”“', flags=re.MULTILINE))
def clear_punctuation(sentence):
return sentence.translate(translator)
@nokados
nokados / doc2vec.py
Created May 25, 2018 09:35
Average word vectors in a text
def calc_embedding(text):
tokens = word_tokenize(text)
vec = np.zeros(100)
num_tokens = 0
for token in tokens:
if token in stopwords_list:
continue
if token in new_model:
vec += new_model[token]
num_tokens += 1
@nokados
nokados / wordcloud.py
Created May 25, 2018 09:39
Wordcloud visualization of clusters
%%time
clusters = dbscan.fit(doc2vec_list)
cl_labels = clusters.labels_.tolist()
def wordcloud_cluster_byIds(cluId):
texts = []
for i in range(0, len(cl_labels)):
if cl_labels[i] == cluId:
for word in word_tokenize(dialogs_concatted.iloc[i].TEXT):
@nokados
nokados / embedding.py
Last active August 6, 2018 17:58
Functions for dealing with embedding
import numpy as np
from gensim.models import KeyedVectors, Word2Vec
from gensim.models.fasttext import FastText as FT_gensim
from nltk.tokenize import sent_tokenize, word_tokenize
import json
import pandas as pd
from tqdm import trange
W2V_PATH = 'data/GoogleNews-vectors-negative300.bin'
@nokados
nokados / keras_scores_class.py
Created May 25, 2018 09:55
Precision, recall, f1_score for Keras
import keras.backend as K
def recall(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
@nokados
nokados / keras_word2vec_embedding.py
Created May 25, 2018 09:59
Embedding Layer for Keras with weights from gensim Word2Vec (or FastText, why not?)
def word2vec_embedding_layer(embeddings_path='data/weights.npz', max_review_length=150):
weights = load_weights(embeddings_path)
layer = Embedding(input_dim=weights.shape[0],
output_dim=weights.shape[1],
input_length=max_review_length,
weights=[weights])
return layer
# How to make 'data/weights.npz'? What is load_weights()?
# See https://gist.github.com/nokados/d5cfec00bc194822f89dff556ff62b29
@nokados
nokados / init_jupyter.py
Last active July 15, 2018 12:05
Snippet for the first block in Jupyter Notebook. Import core libraries, set random seed, enable autoreload imported files
%load_ext autoreload
%autoreload 2
import pandas as pd
from tqdm import tqdm_notebook, tqdm_pandas, tnrange
import time
import numpy as np
from IPython.display import clear_output
import pickle as pkl
import os
@nokados
nokados / has.py
Created June 3, 2018 20:56
Closure. Check if a string includes a pattern.
def has(expr):
return lambda x: bool(re.search(expr, x, flags=re.IGNORECASE|re.MULTILINE|re.DOTALL))
@nokados
nokados / translit.py
Last active June 12, 2018 22:20 — forked from ledovsky/translit.py
Транслитерация на python
# name: это строка которую транслитим
def transliterate(name):
"""
Автор: LarsKort
Дата: 16/07/2011; 1:05 GMT-4;
Не претендую на "хорошесть" словарика. В моем случае и такой пойдет,
вы всегда сможете добавить свои символы и даже слова. Только
это нужно делать в обоих списках, иначе будет ошибка.
"""
# Слоаврь с заменами