Skip to content

Instantly share code, notes, and snippets.

@logkcal
Last active January 23, 2020 13:23
Show Gist options
  • Save logkcal/b4f206eab3467e26994ae1f415c0236f to your computer and use it in GitHub Desktop.
Save logkcal/b4f206eab3467e26994ae1f415c0236f to your computer and use it in GitHub Desktop.

15. Processing Sequences Using RNNs and CNNs

  • cell or memory cell: recurrent neuron, or recurrent neuron layer.
  • cell's state: h(t) = f(h(h-1), x(t)) vs. cell's output: y(t).
  • rnn takes a sequence or the same vector, and produces a sequence -- all outputs except the last one may be ignored.
    • [encoder-decoder] two-step model works much better than trying to translate on the fly with a seq-to-seq model.
    • backpropagation through time

Impl: [simple rnn]

model = keras.models.Sequential([
  keras.layers.SimpleRNN(1, input_shape=[None, 1])  # returns the last output.
])

Impl: deep rnn

model = keras.models.Sequential([
    keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
    keras.layers.SimpleRNN(20, return_sequences=True),
    keras.layers.SimpleRNN(1)  # returns the last output.
])
model = keras.models.Sequential([
  keras.layers.SimpleRNN(units=20, return_sequences=True, input_shape=[None, 1]),
  keras.layers.SimpleRNN(units=20),
  keras.layers.Dense(1)
])
series = generate_time_series(10000, n_steps + 10)
X_train, Y_train = series[:7000, :n_steps], series[:7000, -10:, 0]
X_valid, Y_valid = series[7000:9000, :n_steps], series[7000:9000, -10:, 0]
X_test, Y_test = series[9000:, :n_steps], series[9000:, -10:, 0]

model = keras.models.Sequential([
  keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
  keras.layers.SimpleRNN(20),
  keras.layers.Dense(10)
])
model = keras.models.Sequential([
  keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None, 1]),
  keras.layers.SimpleRNN(20, return_sequences=True),
  keras.layers.TimeDistributed(keras.layers.Dense(10))
])

def last_time_step_mse(Y_true, Y_pred):
    return keras.metrics.mean_squared_error(Y_true[:, -1], Y_pred[:, -1])

optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(loss="mse", optimizer=optimizer, metrics=[last_time_step_mse])
Fighting Unstable Gradients
  • Still helpful: Good parameter initialization, faster optimizers, dropouts, and so on.
  • Let's use hyperbolic tangent (saturating) the default activation, as the outputs & gradients explode easily.
  • Let's use gradient clipping, while keep monitoring the size of gradients using TensorBoard.
class LNSimpleRNNCell(keras.layers.Layer):
    def __init__(self, units, activation="tanh", **kwargs):
        super().__init__(**kwargs)
        self.state_size = units
        self.output_size = units
        self.simple_rnn_cell = keras.layers.SimpleRNNCell(units,
                                                          activation=None)
        self.layer_norm = keras.layers.LayerNormalization()
        self.activation = keras.activations.get(activation)
    def call(self, inputs, states):
        outputs, new_states = self.simple_rnn_cell(inputs, states)
        norm_outputs = self.activation(self.layer_norm(outputs))
        return norm_outputs, [norm_outputs]
model = keras.models.Sequential([
  keras.layers.LSTM(20, return_sequences=True, input_shape=[None, 1]),
  keras.layers.LSTM(20, return_sequences=True),
  keras.layers.TimeDistributed(keras.layers.Dense(10))
])
model = keras.models.Sequential([
  keras.layers.RNN(keras.layers.LSTMCell(20), return_sequences=True,
                   input_shape=[None, 1]),
  keras.layers.RNN(keras.layers.LSTMCell(20), return_sequences=True),
  keras.layers.TimeDistributed(keras.layers.Dense(10))
])
model = keras.models.Sequential([
    keras.layers.Conv1D(filters=20, kernel_size=4, strides=2, padding="valid",
                        input_shape=[None, 1]),
    keras.layers.GRU(20, return_sequences=True),
    keras.layers.GRU(20, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(10))
])

model.compile(loss="mse", optimizer="adam", metrics=[last_time_step_mse])
history = model.fit(X_train, Y_train[:, 3::2], epochs=20,
                    validation_data=(X_valid, Y_valid[:, 3::2]))

LSTM cell


10. Introduction to Artificial Neural Networks with Keras

  • threshold logic unit / TLU: an artificial neuron which computes a weighted sum of its inputs then applies a step function.
  • perceptron with two input neurons, one bias neuron, and three output neurons.
  • multilayer perceptron / MLP with two inputs, one hidden layer of four neurons, and three output neurons (the bias neurons are shown here, but usually they are implicit).
  • feedforward neural network / FNN where the signal flows only in one direction (from the inputs to the outputs).
  • automatic differentiation / autodiff automatically computing gradients -- backpropagation uses fast and accurate reverse-mode autodiff.
  • let's initialize all the hidden layers’ connection weights randomly to allow backpropagation to train a diverse team of neurons.
  • activation functions / ϕ
    • hyperbolic tangent function: tanh(z) = 2σ(2z) – 1.
    • rectified linear unit function: ReLU(z) = max(0, z).
  • learning curves: mean training and validation loss & accuracy measured at the end of each epoch.
  • Use keras.utils.to_categorical() to get one-hot vector labels from sparse labels (i.e., class inices). and np.argmax(axis=1) to go the other way around.
Hyperparameters Regression Binary classification Multilabel binary classification Multiclass classification
Hidden activation ReLU, SELU ... ... ...
Output activation None, ReLU/Soft+ vs. logistic/tanh Logistic Logistic Softmax
Loss MSE vs. MAE/Huber (if outliers) Cross entropy Cross entropy Cross entropy

Classification MLPs with Keras

model = keras.models.Sequential([
  keras.layers.Flatten(input_shape=[28, 28]),
  keras.layers.Dense(300, activation="relu"),
  keras.layers.Dense(100, activation="relu"),
  keras.layers.Dense(10, activation="softmax")
])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
flatten (Flatten)            (None, 784)               0
_________________________________________________________________
dense (Dense)                (None, 300)               235500
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010
=================================================================
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0

model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",
              metrics=["accuracy"])
history = model.fit(X_train, y_train, epochs=30,
                    validation_data=(X_valid, y_valid))

pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.show()

Regression MLPs with Keras

model = keras.models.Sequential([
  keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
  keras.layers.Dense(1)
])
model.compile(loss="mean_squared_error", optimizer="sgd")
history = model.fit(X_train, y_train, epochs=20,
                    validation_data=(X_valid, y_valid))
mse_test = model.evaluate(X_test, y_test)
X_new = X_test[:3]  # pretend these are new instances
y_pred = model.predict(X_new)
input_a = keras.layers.Input(shape=[5], name="wide_input")
input_b = keras.layers.Input(shape=[6], name="deep_input")
hidden1 = keras.layers.Dense(30, activation="relu")(input_b)
hidden2 = keras.layers.Dense(30, activation="relu")(hidden1)
concat = keras.layers.concatenate([input_a, hidden2])
output = keras.layers.Dense(1, name="output")(concat)
model = keras.Model(inputs=[input_A, input_B], outputs=[output])
output = keras.layers.Dense(1, name="main_output")(concat)
aux_output = keras.layers.Dense(1, name="aux_output")(hidden2)
model = keras.Model(inputs=[input_A, input_B], outputs=[output, aux_output])

11. Training Deep Neural Networks

BN layer after Activation

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(300, activation="elu", kernel_initializer="he_normal"),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(100, activation="elu", kernel_initializer="he_normal"),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(10, activation="softmax")
])

BN layer before Activation

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(300, kernel_initializer="he_normal", use_bias=False),
    keras.layers.BatchNormalization(),
    keras.layers.Activation("elu"),
    keras.layers.Dense(100, kernel_initializer="he_normal", use_bias=False),
    keras.layers.BatchNormalization(),
    keras.layers.Activation("elu"),
    keras.layers.Dense(10, activation="softmax")
])

Gradient Clipping

  • Let's track the size of gradients and try different thresholds for clipnorm or clipvalue using TensorBoard.
keras.optimizers.SGD(clipvalue=hparams.clipvalue or 1.0)  # clipnorm?

MC Dropout

In short, MC Dropout is a fantastic technique that boosts dropout models and provides better uncertainty estimates. And of course, since it is just regular dropout during training, it also acts like a regularizer.

y_probas = np.stack([model(X_test_scaled, training=True)
                     for sample in range(100)])
y_proba = y_probas.mean(axis=0)
class MCDropout(keras.layers.Dropout):
  def call(self, inputs):
    return super().call(inputs, training=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment