summary()
get_config()
from_config(config)
set_weights()
set_weights(weights)
to_json()
to_yaml()
save_weights(filepath)
load_weights(filepath, by_name)
layers
Model Sequential /Functional APIs
add(layer)
compile(optimizer, loss, metrics, sample_weight_mode)
fit(x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight)
evaluate(x, y, batch_size, verbose, sample_weight)
predict(x, batch_size, verbose)
predict_classes(x, batch_size, verbose)
predict_proba(x, batch_size, verbose)
train_on_batch(x, y, class_weight, sample_weight)
test_on_batch(x, y, class_weight)
predict_on_batch(x)
fit_generator(generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe)
evaluate_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)
predict_generator(generator, val_samples, max_q_size, nb_worker, pickle_safe)
get_layer(name, index)
Layer
description
IO
params
Dense
vanilla fully connected NN layer
(nb_samples, input_dim) --> (nb_samples, output_dim)
output_dim/shape, init, activation, weights, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim/shape
Activation
Applies an activation function to an output
TN --> TN
activation
Dropout
randomly set fraction p of input units to 0 at each update during training time --> reduce overfitting
TN --> TN
p
SpatialDropout2D/3D
dropout of entire 2D/3D feature maps to counter pixel / voxel proximity correlation
(samples, rows, cols, [stacks,] channels) --> (samples, rows, cols, [stacks,] channels)
p
Flatten
Flattens the input to 1D
(nb_samples, D1, D2, D3) --> (nb_samples, D1xD2xD3)
-
Reshape
Reshapes an output to a different factorization
eg (None, 3, 4) --> (None, 12) or (None, 2, 6)
target_shape
Permute
Permutes dimensions of input - output_shape is same as the input shape, but with the dimensions re-ordered
eg (None, A, B) --> (None, B, A)
dims
RepeatVector
Repeats the input n times
(nb_samples, features) --> (nb_samples, n, features)
n
Merge
merge a list of tensors into a single tensor
[TN] --> TN
layers, mode, concat_axis, dot_axes, output_shape, output_mask, node_indices, tensor_indices, name
Lambda
TensorFlow expression
flexible
function, output_shape, arguments
ActivityRegularization
regularize the cost function
TN --> TN
l1, l2
Masking
identify timesteps in D1 to be skipped
TN --> TN
mask_value
Highway
LSTM for FFN ?
(nb_samples, input_dim) --> (nb_samples, output_dim)
same as Dense + transform_bias
MaxoutDense
takes the element-wise maximum of prev layer - to learn a convex, piecewise linear activation function over the inputs ??
(nb_samples, input_dim) --> (nb_samples, output_dim)
same as Dense + nb_feature
TimeDistributed
Apply a Dense layer for each D1 time_dimension
(nb_sample, time_dimension, input_dim) --> (nb_sample, time_dimension, output_dim)
Dense
Layer
description
IO
params
Convolution1D
filter neighborhoods of 1D inputs
(samples, steps, input_dim) --> (samples, new_steps, nb_filter)
nb_filter, filter_length, init, activation, weights, border_mode, subsample_length, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, bias, input_dim, input_length
Convolution2D
filter neighborhoods of 2D inputs
(samples, rows, cols, channels) --> (samples, new_rows, new_cols, nb_filter)
like Convolution1D + nb_row, nb_col
instead of filter_length
, subsample, dim_ordering
AtrousConvolution1/2D
dilated convolution with holes
same as Convolution2D
same as Convolution1/2D + atrous_rate
SeparableConvolution2D
first does a depth 1st spatial convolution on each input channel separately, then a pointwise convolution which mixes together the resulting output channels.
same as Convolution2D
same as Convolution2D + depth_multiplier, depthwise_regularizer, pointwise_regularizer, depthwise_constraint, pointwise_constraint
Deconvolution2D
Transposed convolution ???
Convolution3D
(samples, conv_dim1, conv_dim2, conv_dim3, channels) --> (samples, new_conv_dim1, new_conv_dim2, new_conv_dim3, nb_filter)
kernel_dim1, kernel_dim2, kernel_dim3
Cropping1D/2D/3D
crops along the dimension(s)
(samples, depth, [axes_to_crop]) -->(samples, depth, [cropped_axes])
cropping, dim_order
UpSampling1D/2D/3D
Repeat each step x times along the specified axes
(samples, [dims], channels) --> (samples, [upsampled_dims], channels)
size, dim_order
ZeroPadding1/2/3D
0 padding
(samples, [dims], channels) --> (samples, [padded_dims], channels)
padding, dim_order
Pooling && Locally Connected
Layer
description
IO
params
Max/AveragePooling1/2/3D
downscale to max / average
(samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels)
pool_size, strides, border_mode, dim_ordering
GlobalMax/GlobalAveragePooling1/2D
downscale to max / average
(samples, [len_pool_dimN], channels) -->(samples, [pooled_dimN], channels)
dim_ordering
Locally Connected1D/2D
similarly to ConvolutionxD but weights are unshared - different filters applied at each patch
like ConvolutionxD + subsample
Layer
description
IO
params
Recurrent
abstract base class
(nb_samples, timesteps, input_dim) --> (return_sequences)?(nb_samples, timesteps, output_dim):(nb_samples, output_dim)
weights, return_sequences, go_backwards, stateful, unroll, consume_less, input_dim, input_length
SimpleRNN
Fully-connected RNN where output is fed back as input
like Recurrent
Recurrent + output_dim, init, inner_init, activation, W_regularizer, U_regularizer, b_regularizer, dropout_W, dropout_U
GRU
Gated Recurrent Unit
like Recurrent
like SimpleRNN
LSTM
Long-Short Term Memory unit
like Recurrent
like SimpleRNN
Layer
description
IO
params
Embedded
Turn positive integers (indexes) into dense vectors of fixed size
(nb_samples, sequence_length) --> (nb_samples, sequence_length, output_dim)
input_dim, output_dim, init, input_length, W_regularizer, activity_regularizer, W_constraint, mask_zero, weights, dropout
BatchNormalization
at each batch, normalize activations of previous layer (mean:0, sd: 1)
TN --> TN
epsilon, mode, axis, momentum, weights, beta_init, gamma_init, gamma_regularizer, beta_regularizer
Layer
description
IO
params
LeakyReLU
ReLU that allows a small gradient when unit is inactive: f(x) = alpha * x for x < 0, f(x) = x for x >= 0
TN --> TN
alpha
PReLU
Parametric ReLU - gradient is a learned array: f(x) = alphas * x for x < 0, f(x) = x for x >= 0
TN --> TN
init, weights
ELU
Exponential Linear Unit: f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0
TN --> TN
alpha
ParametricSoftplus
alpha * log(1 + exp(beta * x))
TN --> TN
alpha, beta
ThresholdedReLU
f(x) = x for x > theta f(x) = 0 otherwise
TN --> TN
theta
SReLU
S-shaped ReLU
TN --> TN
t_left_init, a_left_init, t_right_init, a_right_init
Layer
description
IO
params
GaussianNoise
mitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sigma
TN --> TN
sigma
GaussianDropout
mitigate overfitting by smoothing: 0-centered Gaussian noise with standard deviation sqrt(p/(1-p))
TN --> TN
p
type
name
transform
params
sequence
pad_sequences
list of nb_samples
scalar sequence --> 2D array of shape (nb_samples, nb_timesteps)
sequences, maxlen, dtype
skipgrams
word index list of int --> list of (word,word)
sequence, vocabulary_size, window_size, negative_samples, shuffle, categorical, sampling_table
make_sampling_table
generate word index array of shape (size,) for skipgrams
size, sampling_factor
Text
text_to_word_sequence
sentence --> list of words
text, filters, lower, split
one_hot
text --> list of n word indexes
text, n, filters, lower, split
Tokenizer
text --> list of word indexes
nb_words, filters, lower, split
image
ImageDataGenerator
batches of image tensors
featurewise_center, samplewise_center, featurewise_std_normalization, samplewise_std_normalization,zca_whitening, rotation_range,width_shift_range, height_shift_range,shear_range,zoom_range,channel_shift_range, fill_mode, cval, horizontal_flip, vertical_flip, rescale, dim_ordering
Objectives (Loss Functions)
mean_squared_error / mse
mean_absolute_error / mae
mean_absolute_percentage_error / mape
mean_squared_logarithmic_error / msle
squared_hinge
hinge
binary_crossentropy (logloss)
categorical_crossentropy (multiclass logloss) - requires labels be binary arrays of shape (nb_samples, nb_classes)
sparse_categorical_crossentropy As above but accepts sparse labels
kullback_leibler_divergence / kld Information gain from a predicted probability distribution Q to a true probability distribution P
poisson Mean of (predictions - targets * log(predictions))
cosine_proximity negative mean cosine proximity between predictions and targets
binary_accuracy - for binary classification
categorical_accuracy -for multiclass classification
sparse_categorical_accuracy
top_k_categorical_accuracy - when the target class is within the top-k predictions provided
mean_squared_error (mse) - for regression
mean_absolute_error (mae)
mean_absolute_percentage_error (mape)
mean_squared_logarithmic_error (msle)
hinge - hinge loss: `max(1 - y_true * y_pred, 0)``
squared_hinge hinge ^ 2
categorical_crossentropy - for multiclass classification
sparse_categorical_crossentropy
binary_crossentropy -for binary classification
kullback_leibler_divergence
poisson
cosine_proximity
matthews_correlation - for quality of binary classification
fbeta_score - weighted harmonic mean of precision and recall in multi-label classification
SGD - Stochastic gradient descent, with support for momentum, learning rate decay, and Nesterov momentum
RMSProp - good for RNNs
Adagrad
AdaDelta
AdaMax
Adam
Nadam
softmax
softplus
softsign
relu
tanh
sigmoid
hard_sigmoid
linear
name
description
params
Callback
abstract base class - hooks: on_epoch_end
, on_batch_start
, on_batch_end
BaseLogger
accumulates epoch averages of metrics being monitored
ProgbarLogger
writes to stdout
History
records events into a History object (automatic)
ModelCheckpoint
Save model after every epoch, according to monitored quantity
filepath, monitor, verbose, save_best_only, save_weights_only, mode
EarlyStopping
stop training when a monitored quantity has stopped improving after patience
monitor, min_delta, patience, verbose, mode
RemoteMonitor
stream events to a server
root, path, field
LearningRateScheduler
?
schedule
TensorBoard
write a log for TensorBaord to visualize
log_dir, histogram_freq, write_graph, write_images
ReduceLROnPlateau
Reduce learning rate when a metric has stopped improving
monitor, factor, patience, verbose, mode, epsilon, cooldown, min_lr
CSVLogger
stream epoch results to a csv file
filename, separator, append
LambdaCallback
custom callback
on_epoch_begin, on_epoch_end, on_batch_begin, on_batch_end, on_train_begin, on_train_end
uniform
lecun_uniform
identity
orthogonal
zero
glorot_normal - Gaussian initialization * **scaled by fan_in + fan_out
glorot_uniform
he_uniform
W_regularizer, b_regularizer (WeightRegularizer)
activity_regularizer (ActivityRegularizer)
l1 - LASSO
l2 - weight decay, Ridge
l1l2 - ElasticNet
W_constraint - for the main weights matrix
b_constraint for bias
maxnorm - maximum-norm
nonneg - non-negativity
unitnorm - unit-norm
batch size
number of epochs
training optimization algorithm
Learning Weight
momentum
network weight initialization
activation function
dropout regularization
number of neurons in a hidden layer
depth of hidden layers