husmen/nv_dl_fp16_adam.md

## nv_dl_fp16_adam.md

      
    Raw
  

              nv_dl_fp16_adam.md
            
          
    Importing Keras

from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Lambda
from keras.layers import Conv2D, MaxPooling2D, Cropping2D
from keras.callbacks import ModelCheckpoint
from keras.optimizers import SGD
from keras import backend as K

Configure Session

import tensorflow as tf
tf_config = tf.ConfigProto(allow_soft_placement=False)
tf_config.gpu_options.allow_growth = True
tf_config.log_device_placement=True
s = tf.Session(config=tf_config)

K.clear_session()
K.set_session(s)
K.floatx() # check current default float type
K.set_floatx('float16') # set default float type to FP16

Model based on NVIDIA's End to End Learning for Self-Driving Cars model

model = Sequential()
model.add(Cropping2D(cropping=((144,76),(0,672)), input_shape=(376,1344,3)))
model.add(Lambda(lambda x: (2*x / 255.0) - 1.0))
model.add(Conv2D(24, (5, 5), activation="relu", strides=(2, 2)))
model.add(Conv2D(36, (5, 5), activation="relu", strides=(2, 2)))
model.add(Conv2D(48, (5, 5), activation="relu", strides=(2, 2)))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(Flatten())
model.add(Dense(100))
model.add(Dense(50))
model.add(Dense(10))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adamax') # tried adam, adamax, sgd

Model summary

model.summary()

Output of model summary

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
cropping2d_1 (Cropping2D)    (None, 156, 672, 3)       0         
_________________________________________________________________
lambda_1 (Lambda)            (None, 156, 672, 3)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 76, 334, 24)       1824      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 36, 165, 36)       21636     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 81, 48)        43248     
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 14, 79, 64)        27712     
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 12, 77, 64)        36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 59136)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 100)               5913700   
_________________________________________________________________
dense_2 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_3 (Dense)              (None, 10)                510       
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 11        
=================================================================
Total params: 6,050,619
Trainable params: 6,050,619
Non-trainable params: 0
_________________________________________________________________

Train

model_checkpoint = ModelCheckpoint('weights.h5', monitor='val_loss', save_best_only=True)
hs = model.fit_generator(generate_data(),steps_per_epoch=int(splitpoint/ bsize),
                validation_data=generate_data_val(), 
                validation_steps=(dlen-splitpoint)/bsize, epochs=10, callbacks=[model_checkpoint])

Sample Output in case of FP16 and Adam

Epoch 1/10
1381/1381 [==============================] - 1033s 748ms/step - loss: nan
...

Sample Output in case of FP32 and Adam, or FP16 and other optimizers

Epoch 1/10
1381/1381 [==============================] - 1033s 748ms/step - loss: 0.7524 - val_loss: 0.0926
...