Create a gist now

Instantly share code, notes, and snippets.

@jreuben11 / Secret
Last active Mar 17, 2017

Keras QuickRef

Keras QuickRef

Model API

load_weights(filepath, by_name)

Model Sequential /Functional APIs

compile(optimizer, loss, metrics, sample_weight_mode)
fit(x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight)
evaluate(x, y, batch_size, verbose, sample_weight)

predict(x, batch_size, verbose)
predict_classes(x, batch_size, verbose)
predict_proba(x, batch_size, verbose)

train_on_batch(x, y, class_weight, sample_weight)
test_on_batch(x, y, class_weight)

fit_generator(generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe)
evaluate_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)
predict_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)

get_layer(name, index)



Layer description IO params
Dense vanilla fully connected NN layer (nb_samples, input_dim) --> (nb_samples, output_dim) output_dim/shape, init, activation, weights, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim/shape
Activation Applies an activation function to an output TN --> TN activation
Dropout randomly set fraction p of input units to 0 at each update during training time --> reduce overfitting TN --> TN p
SpatialDropout2D/3D dropout of entire 2D/3D feature maps to counter pixel / voxel proximity correlation (samples, rows, cols, [stacks,] channels) --> (samples, rows, cols, [stacks,] channels) p
Flatten Flattens the input to 1D (nb_samples, D1, D2, D3) --> (nb_samples, D1xD2xD3) -
Reshape Reshapes an output to a different factorization eg (None, 3, 4) --> (None, 12) or (None, 2, 6) target_shape
Permute Permutes dimensions of input - output_shape is same as the input shape, but with the dimensions re-ordered eg (None, A, B) --> (None, B, A) dims
RepeatVector Repeats the input n times (nb_samples, features) --> (nb_samples, n, features) n
Merge merge a list of tensors into a single tensor [TN] --> TN layers, mode, concat_axis, dot_axes, output_shape, output_mask, node_indices, tensor_indices, name
Lambda TensorFlow expression flexible function, output_shape, arguments
ActivityRegularization regularize the cost function TN --> TN l1, l2
Masking identify timesteps in D1 to be skipped TN --> TN mask_value
Highway LSTM for FFN ? (nb_samples, input_dim) --> (nb_samples, output_dim) same as Dense + transform_bias
MaxoutDense takes the element-wise maximum of prev layer - to learn a convex, piecewise linear activation function over the inputs ?? (nb_samples, input_dim) --> (nb_samples, output_dim) same as Dense + nb_feature
TimeDistributed Apply a Dense layer for each D1 time_dimension (nb_sample, time_dimension, input_dim) --> (nb_sample, time_dimension, output_dim) Dense


Layer description IO params
Convolution1D filter neighborhoods of 1D inputs (samples, steps, input_dim) --> (samples, new_steps, nb_filter) nb_filter, filter_length, init, activation, weights, border_mode, subsample_length, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim, input_length
Convolution2D filter neighborhoods of 2D inputs (samples, rows, cols, channels) --> (samples, new_rows, new_cols, nb_filter) like Convolution1D + nb_row, nb_col instead of filter_length, subsample, dim_ordering
AtrousConvolution1/2D dilated convolution with holes same as Convolution2D same as Convolution1/2D + atrous_rate
SeparableConvolution2D first does a depth 1st spatial convolution on each input channel separately, then a pointwise convolution which mixes together the resulting output channels. same as Convolution2D same as Convolution2D + depth_multiplier, depthwise_regularizer, pointwise_regularizer, depthwise_constraint, pointwise_constraint
Deconvolution2D Transposed convolution ???
Convolution3D (samples, conv_dim1, conv_dim2, conv_dim3, channels) --> (samples, new_conv_dim1, new_conv_dim2, new_conv_dim3, nb_filter) kernel_dim1, kernel_dim2, kernel_dim3
Cropping1D/2D/3D crops along the dimension(s) (samples, depth, [axes_to_crop]) -->(samples, depth, [cropped_axes]) cropping, dim_order
UpSampling1D/2D/3D Repeat each step x times along the specified axes (samples, [dims], channels) --> (samples, [upsampled_dims], channels) size, dim_order
ZeroPadding1/2/3D 0 padding (samples, [dims], channels) --> (samples, [padded_dims], channels) padding, dim_order

Pooling && Locally Connected

Layer description IO params
Max/AveragePooling1/2/3D downscale to max / average (samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels) pool_size, strides, border_mode, dim_ordering
GlobalMax/GlobalAveragePooling1/2D downscale to max / average (samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels) dim_ordering
Locally Connected1D/2D similarly to ConvolutionxD but weights are unshared - different filters applied at each patch like ConvolutionxD + subsample


Layer description IO params
Recurrent abstract base class (nb_samples, timesteps, input_dim) --> (return_sequences)?(nb_samples, timesteps, output_dim):(nb_samples, output_dim) weights, return_sequences, go_backwards, stateful, unroll, consume_less, input_dim, input_length
SimpleRNN Fully-connected RNN where output is fed back as input like Recurrent Recurrent + output_dim, init, inner_init, activation, W_regularizer, U_regularizer, b_regularizer, dropout_W, dropout_U
GRU Gated Recurrent Unit like Recurrent like SimpleRNN
LSTM Long-Short Term Memory unit like Recurrent like SimpleRNN


Layer description IO params
Embedded Turn positive integers (indexes) into dense vectors of fixed size (nb_samples, sequence_length) --> (nb_samples, sequence_length, output_dim) input_dim, output_dim, init, input_length, W_regularizer, activity_regularizer, W_constraint, mask_zero, weights, dropout
BatchNormalization at each batch, normalize activations of previous layer (mean:0, sd: 1) TN --> TN epsilon, mode, axis, momentum, weights, beta_init, gamma_init, gamma_regularizer, beta_regularizer


Layer description IO params
LeakyReLU ReLU that allows a small gradient when unit is inactive: f(x) = alpha * x for x < 0, f(x) = x for x >= 0 TN --> TN alpha
PReLU Parametric ReLU - gradient is a learned array: f(x) = alphas * x for x < 0, f(x) = x for x >= 0 TN --> TN init, weights
ELU Exponential Linear Unit: f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0 TN --> TN alpha
ParametricSoftplus alpha * log(1 + exp(beta * x)) TN --> TN alpha, beta
ThresholdedReLU f(x) = x for x > theta f(x) = 0 otherwise TN --> TN theta
SReLU S-shaped ReLU TN --> TN t_left_init, a_left_init, t_right_init, a_right_init


Layer description IO params
GaussianNoise mitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sigma TN --> TN sigma
GaussianDropout mitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sqrt(p/(1-p)) TN --> TN p


type name transform params
sequence pad_sequences list of nb_samples scalar sequence --> 2D array of shape (nb_samples, nb_timesteps) sequences, maxlen, dtype
skipgrams word index list of int --> list of (word,word) sequence, vocabulary_size, window_size, negative_samples, shuffle, categorical, sampling_table
make_sampling_table generate word index array of shape (size,) for skipgrams size, sampling_factor
Text text_to_word_sequence sentence --> list of words text, filters, lower, split
one_hot text --> list of n word indexes text, n, filters, lower, split
Tokenizer text --> list of word indexes nb_words, filters, lower, split
image ImageDataGenerator batches of image tensors featurewise_center, samplewise_center, featurewise_std_normalization, samplewise_std_normalization,zca_whitening, rotation_range,width_shift_range, height_shift_range,shear_range,zoom_range,channel_shift_range, fill_mode, cval, horizontal_flip, vertical_flip, rescale, dim_ordering

Objectives (Loss Functions)

  • mean_squared_error / mse
  • mean_absolute_error / mae
  • mean_absolute_percentage_error / mape
  • mean_squared_logarithmic_error / msle
  • squared_hinge
  • hinge
  • binary_crossentropy (logloss)
  • categorical_crossentropy (multiclass logloss) - requires labels be binary arrays of shape (nb_samples, nb_classes)
  • sparse_categorical_crossentropy As above but accepts sparse labels
  • kullback_leibler_divergence / kld Information gain from a predicted probability distribution Q to a true probability distribution P
  • poisson Mean of (predictions - targets * log(predictions))
  • cosine_proximity negative mean cosine proximity between predictions and targets


  • binary_accuracy - for binary classification
  • categorical_accuracy -for multiclass classification
  • sparse_categorical_accuracy
  • top_k_categorical_accuracy - when the target class is within the top-k predictions provided
  • mean_squared_error (mse) - for regression
  • mean_absolute_error (mae)
  • mean_absolute_percentage_error (mape)
  • mean_squared_logarithmic_error (msle)
  • hinge - hinge loss: `max(1 - y_true * y_pred, 0)``
  • squared_hinge hinge ^ 2
  • categorical_crossentropy - for multiclass classification
  • sparse_categorical_crossentropy
  • binary_crossentropy -for binary classification
  • kullback_leibler_divergence
  • poisson
  • cosine_proximity
  • matthews_correlation - for quality of binary classification
  • fbeta_score - weighted harmonic mean of precision and recall in multi-label classification


  • SGD - Stochastic gradient descent, with support for momentum, learning rate decay, and Nesterov momentum
  • RMSProp - good for RNNs
  • Adagrad
  • AdaDelta
  • AdaMax
  • Adam
  • Nadam

Activation Functions

  • softmax
  • softplus
  • softsign
  • relu
  • tanh
  • sigmoid
  • hard_sigmoid
  • linear


name description params
Callback abstract base class - hooks: on_epoch_end, on_batch_start, on_batch_end
BaseLogger accumulates epoch averages of metrics being monitored
ProgbarLogger writes to stdout
History records events into a History object (automatic)
ModelCheckpoint Save model after every epoch, according to monitored quantity filepath, monitor, verbose, save_best_only, save_weights_only, mode
EarlyStopping stop training when a monitored quantity has stopped improving after patience monitor, min_delta, patience, verbose, mode
RemoteMonitor stream events to a server root, path, field
LearningRateScheduler ? schedule
TensorBoard write a log for TensorBaord to visualize log_dir, histogram_freq, write_graph, write_images
ReduceLROnPlateau Reduce learning rate when a metric has stopped improving monitor, factor, patience, verbose, mode, epsilon, cooldown, min_lr
CSVLogger stream epoch results to a csv file filename, separator, append
LambdaCallback custom callback on_epoch_begin, on_epoch_end, on_batch_begin, on_batch_end, on_train_begin, on_train_end

Init Functions

  • uniform
  • lecun_uniform
  • identity
  • orthogonal
  • zero
  • glorot_normal - Gaussian initialization * **scaled by fan_in + fan_out
  • glorot_uniform
  • he_uniform



  • W_regularizer, b_regularizer (WeightRegularizer)
  • activity_regularizer (ActivityRegularizer)


  • l1 - LASSO
  • l2 - weight decay, Ridge
  • l1l2 - ElasticNet



  • W_constraint - for the main weights matrix
  • b_constraint for bias


  • maxnorm - maximum-norm
  • nonneg - non-negativity
  • unitnorm - unit-norm

Tuning Hyper-Parameters:

  • batch size
  • number of epochs
  • training optimization algorithm
  • Learning Weight
  • momentum
  • network weight initialization
  • activation function
  • dropout regularization
  • number of neurons in a hidden layer
  • depth of hidden layers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment