Skip to content

Instantly share code, notes, and snippets.

@oscar-defelice
Created May 6, 2020 16:52
Show Gist options
  • Save oscar-defelice/5772c4baa56dea1240640cf415ee5b13 to your computer and use it in GitHub Desktop.
Save oscar-defelice/5772c4baa56dea1240640cf415ee5b13 to your computer and use it in GitHub Desktop.
def build_embedding(df, features, emb_dim = 10, name = 'embedding_layer'):
'''
Define the embedding neural network to encode features in a emb_dim-dimensional vector.
Parameters
----------
df : pandas DataFrame
dataframe containing input metadata
features : list of str
list of categorical features (columns of df)
emb_dim : int
vector size, dimension of the embedding space
Default : 10
name : str
name given to the embedding layer
Default : 'embedding_layer'
Return
------
network : Keras Model object
Partial network architecture modelling embeddings to be trained
'''
inputs = []
concat = []
cat_sizes = {}
cat_embsizes = {}
for cat in features:
cat_sizes[cat] = df[cat].nunique()
cat_embsizes[cat] = min(50, cat_sizes[cat]//2+1)
x = Input((1,), name=cat)
inputs.append(x)
x = Embedding(cat_sizes[cat] + 1, cat_embsizes[cat], input_length=1)(x)
x = Reshape((cat_embsizes[cat],))(x)
concat.append(x)
if len(concat) > 1:
x = Concatenate()(concat)
x = Dense(emb_dim, activation='relu')(x)
return x, inputs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment