Skip to content

Instantly share code, notes, and snippets.

View nokados's full-sized avatar
🦄
Unicorn shepherd

Nikita Furin nokados

🦄
Unicorn shepherd
  • CraftTalk
  • Moscow, Russia
View GitHub Profile
@nokados
nokados / keras_scores_class.py
Created May 25, 2018 09:55
Precision, recall, f1_score for Keras
import keras.backend as K
def recall(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
@nokados
nokados / embedding.py
Last active August 6, 2018 17:58
Functions for dealing with embedding
import numpy as np
from gensim.models import KeyedVectors, Word2Vec
from gensim.models.fasttext import FastText as FT_gensim
from nltk.tokenize import sent_tokenize, word_tokenize
import json
import pandas as pd
from tqdm import trange
W2V_PATH = 'data/GoogleNews-vectors-negative300.bin'
@nokados
nokados / wordcloud.py
Created May 25, 2018 09:39
Wordcloud visualization of clusters
%%time
clusters = dbscan.fit(doc2vec_list)
cl_labels = clusters.labels_.tolist()
def wordcloud_cluster_byIds(cluId):
texts = []
for i in range(0, len(cl_labels)):
if cl_labels[i] == cluId:
for word in word_tokenize(dialogs_concatted.iloc[i].TEXT):
@nokados
nokados / doc2vec.py
Created May 25, 2018 09:35
Average word vectors in a text
def calc_embedding(text):
tokens = word_tokenize(text)
vec = np.zeros(100)
num_tokens = 0
for token in tokens:
if token in stopwords_list:
continue
if token in new_model:
vec += new_model[token]
num_tokens += 1
@nokados
nokados / clear_punctuation.py
Last active June 15, 2018 06:46
Clear punctuation for Russian texts, except ?
import string
translator = str.maketrans('', '', re.sub(r'[\?-]', '', string.punctuation+'«»”“', flags=re.MULTILINE))
def clear_punctuation(sentence):
return sentence.translate(translator)
@nokados
nokados / tqdm_pandas.py
Last active February 1, 2019 11:22
Initializing pandas's progress_apply in jupyter notebook
from tqdm import tqdm_notebook
tqdm_notebook().pandas()