marquisvictor/Creating a TF.data.Dataset object and training a model

## Creating a TF.data.Dataset object and training a model
Hi Ore,

I think I need to do a medium write up on this, because most of the tutorials require manipulating tensorflow's data classes directory. But here's a simple intuitive example I used.

First you load your csv to pandas.

Next, you add another column to your dataframe which are the paths to your images.

then you do your train_test_split() but here you'll pass in the (paths, labels) along with other parameters such as random_state, test_size etc etc

After doing that, you should add and run the following functions in the next cell:

def decode_image(path, label=None):
    bits = tf.io.read_file(path)
    image = tf.image.decode_jpeg(bits, channels=3)       # detect and read images
    image = tf.cast(image, tf.float32) / 225             # standardizes pixel value between 0 and 1
    image = tf.image.resize(image, image_size)
    if label is None:
        return image
    else:
        return image, label

The decode function allows tf to read the image in its own format

Then optionally, you can add the augmentation function below:


def data_augment(image, label=None):
    image = tf.image.random_flip_up_down(image)
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_brightness(image, max_delta=0.5247078)
    image = tf.image.random_saturation(image, 0.3824261, 1.4029386)
    image = tf.image.random_hue(image,  0.1267652)
    image = tf.image.random_contrast(image, 0.3493415, 1.3461331)
    image = tf.clip_by_value(image, 0.0, 1.0)
    if label is None:
        return image
    else:
        return image, label

After doing that, you then create the tf.data.Dataset object like this:

train_dataset = (
    tf.data.Dataset
    .from_tensor_slices((x_train, y_train))
    .shuffle(10000)
    .map(decode_image, num_parallel_calls=AUTO)
    .map(data_augment, num_parallel_calls=AUTO)
    .repeat()
    .batch(batch_size,drop_remainder=True)
    .prefetch(AUTO)
)

val_dataset = (
    tf.data.Dataset
    .from_tensor_slices((x_test, y_test))
    .map(decode_image, num_parallel_calls=AUTO)
    .batch(batch_size)
    .cache()
)


# Then you build your model here

# Compile it here

model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                     metrics = [tf.keras.metrics.BinaryAccuracy(),],
                     optimizer=tf.keras.optimizers.Adam(lr=3e-5)
                     )

Then you fit the model.

That's where you pass in the train_dataset and the val_dataset object you created earlier. like this:

STEPS_PER_EPOCH =  len(x_train)// batch_size
valid_step  =  len(x_test)// batch_size


history = model.fit(
    train_dataset,
    epochs=50,
    steps_per_epoch=STEPS_PER_EPOCH,
    validation_data=val_dataset,
    validation_steps=valid_step
)

Let me know if you have any issues making it work.
	Hi Ore,

	I think I need to do a medium write up on this, because most of the tutorials require manipulating tensorflow's data classes directory. But here's a simple intuitive example I used.

	First you load your csv to pandas.

	Next, you add another column to your dataframe which are the paths to your images.

	then you do your train_test_split() but here you'll pass in the (paths, labels) along with other parameters such as random_state, test_size etc etc

	After doing that, you should add and run the following functions in the next cell:

	def decode_image(path, label=None):
	bits = tf.io.read_file(path)
	image = tf.image.decode_jpeg(bits, channels=3) # detect and read images
	image = tf.cast(image, tf.float32) / 225 # standardizes pixel value between 0 and 1
	image = tf.image.resize(image, image_size)
	if label is None:
	return image
	else:
	return image, label

	The decode function allows tf to read the image in its own format

	Then optionally, you can add the augmentation function below:


	def data_augment(image, label=None):
	image = tf.image.random_flip_up_down(image)
	image = tf.image.random_flip_left_right(image)
	image = tf.image.random_brightness(image, max_delta=0.5247078)
	image = tf.image.random_saturation(image, 0.3824261, 1.4029386)
	image = tf.image.random_hue(image, 0.1267652)
	image = tf.image.random_contrast(image, 0.3493415, 1.3461331)
	image = tf.clip_by_value(image, 0.0, 1.0)
	if label is None:
	return image
	else:
	return image, label

	After doing that, you then create the tf.data.Dataset object like this:

	train_dataset = (
	tf.data.Dataset
	.from_tensor_slices((x_train, y_train))
	.shuffle(10000)
	.map(decode_image, num_parallel_calls=AUTO)
	.map(data_augment, num_parallel_calls=AUTO)
	.repeat()
	.batch(batch_size,drop_remainder=True)
	.prefetch(AUTO)
	)

	val_dataset = (
	tf.data.Dataset
	.from_tensor_slices((x_test, y_test))
	.map(decode_image, num_parallel_calls=AUTO)
	.batch(batch_size)
	.cache()
	)


	# Then you build your model here

	# Compile it here

	model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
	metrics = [tf.keras.metrics.BinaryAccuracy(),],
	optimizer=tf.keras.optimizers.Adam(lr=3e-5)
	)

	Then you fit the model.

	That's where you pass in the train_dataset and the val_dataset object you created earlier. like this:

	STEPS_PER_EPOCH = len(x_train)// batch_size
	valid_step = len(x_test)// batch_size


	history = model.fit(
	train_dataset,
	epochs=50,
	steps_per_epoch=STEPS_PER_EPOCH,
	validation_data=val_dataset,
	validation_steps=valid_step
	)

	Let me know if you have any issues making it work.