Skip to content

Instantly share code, notes, and snippets.

@wut0n9
Last active January 29, 2019 07:39
Show Gist options
  • Save wut0n9/007db9cbb4ee4046ee40df3065fb0fb0 to your computer and use it in GitHub Desktop.
Save wut0n9/007db9cbb4ee4046ee40df3065fb0fb0 to your computer and use it in GitHub Desktop.
[title] w2v作为预训练使用&&OOV默认向量&&词向量归一化 #word2vec #norm_w2v
def padding_vector(embedding):
"""
添加OOV默认词向量
:param embedding:
:return:
"""
alpha = 0.5 * (2.0 * np.random.random() - 1.0)
curr_embed = (2.0 * np.random.random_sample([embedding.shape[1]]) - 1.0) * alpha
return np.row_stack((embedding, curr_embed))
def w2v_helper():
w2v = KeyedVectors.load_word2vec_format(w2v_fpath, binary=True)
embedding = w2v.vectors
embedding_padded = padding_vector(embedding)
embedding_norm = norm_embedding(embedding_padded)
return embedding_norm
def norm_embedding(embedding):
"""
w2v 归一化
:param embedding:
:return:
"""
sum = np.sqrt(np.sum(np.square(embedding), axis=1))
embedding = embedding / sum.reshape((len(embedding), 1))
return embedding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment