Skip to content

Instantly share code, notes, and snippets.

View kanekomasahiro's full-sized avatar

Masahiro Kaneko kanekomasahiro

View GitHub Profile
@kanekomasahiro
kanekomasahiro / calculate_matrix_cosine_similarity_numpy.py
Created March 27, 2021 07:28
numpyで行列のコサイン類似度を計算
import numpy as np
def calculate_matrix_cosine_similarity(matrix1, matrix2):
return np.dot(matrix1, matrix2.T) / (np.linalg.norm(matrix1, axis=1) * np.linalg.norm(matrix2, axis=1)).reshape(-1, 1)
@kanekomasahiro
kanekomasahiro / flat_list_of_lists.py
Created March 27, 2021 06:30
listのlistをflatにする
import itertools
def flat_list_of_lists(inputs):
return itertools.chain.from_iterable(inputs)
@kanekomasahiro
kanekomasahiro / calculate_mean_vector.py
Last active March 28, 2021 06:42
gensimのembeddingの平均ベクトルを計算する
import numpy as np
def calculate_mean_vector(embedding):
return np.mean(embedding[list(embedding.vocab)], axis=0)
@kanekomasahiro
kanekomasahiro / calculate_vector_cosine_similarity_numpy.py
Last active March 28, 2021 06:42
numpyでベクトルのコサイン類似度を計算する
import numpy as np
def calculate_vector_cosine_similarity(vector1, vector2):
return np.dot(vector1, vector2) / (np.linalg.norm(vector1) * np.linalg.norm(vector2))
@kanekomasahiro
kanekomasahiro / load_embedding_with_gensim.py
Last active April 10, 2021 02:51
gensimのembeddingをloadする
import linecache
from gensim.models import KeyedVectors
def load_embedding_with_gensim(embedding_name):
'''
Load embeddings with gensim.
'''
if embedding_name.endswith('bin'):
@kanekomasahiro
kanekomasahiro / parser_list.py
Last active March 28, 2021 06:43
argparseでlistを受け取る
import argparse
parser = argparse.ArgumentParser()
args_list = lambda x:list(map(str, x.split(',')))
parser.add_argument('--inputs', type=args_list)
args = parser.parse_args()
@kanekomasahiro
kanekomasahiro / template.py
Last active April 22, 2021 03:39
pythonのtemplateコード.
import argparse
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--input', type=str, required=True)
args = parser.parse_args()
@kanekomasahiro
kanekomasahiro / save_gensim_embedding_from_dict_embedding.py
Last active April 21, 2021 06:13
dict形式のembeddingをgensimのbinで保存する.
import gensim
import argparse
import numpy as np
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--input', type=str, required=True)
parser.add_argument('--output', type=str, required=True)
@kanekomasahiro
kanekomasahiro / split_sentence_to_words.py
Last active March 28, 2021 06:45
正規表現を使った英語の単語分割
import regex as re
def split_sentence_to_words(sent):
pat = re.compile(r"'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+")
return re.findall(pat, sent)
@kanekomasahiro
kanekomasahiro / save_word_embedding_text_to_binary.py
Last active April 17, 2021 01:53
textで保存されたgensimの単語分散表現をbinで保存する.
import sys
import linecache
from gensim.models import KeyedVectors
def save_word_embedding_text_to_binary(input, output):
if linecache.getline(input, 1).split() == 2:
no_header = False
else: