Data augmentation is the process to increase the diversity of dataset by using synthetic data generation. Normally, when using data augmentation, we have to sequentially augment the image from the datasets. However, this method gives the unaltered image with one figure chances.
From the above reason, there is a modification of data augmentation into two methods.
This approach is to modify the data augmentation processes (Figure 1) to have
- 50% chance to get the unmodified image
- the rest 50% chance to get equal chances to get data augmentation.
Inside the rest 50%, the chances to get each data augmentation is equally distributed according to Figure 2.
The code to make one step data augmentation is shown below.
The first part is to import the necessary libraries.
import albumentations
from torchvision import transforms
import numpy as np
from PIL import Image
The next part is to write the transformation variables. It is separated in three variables.
The first variable is to write the image intensity modification data augmentation by using albumentations.
albu_transformations = [A.GaussianBlur(p=1.0),
A.RandomBrightnessContrast(p=1.0),
A.augmentations.transforms.GaussNoise(p=1.0)]
albu_transformations = [A.Compose([x]) for x in albu_transformations]
The second variable is to write the geometric transformation for random scaling (target_size is the desired image size).
scaling_transformations = transforms.Compose([
transforms.Resize((target_size, target_size)),
transforms.RandomResizedCrop(size=target_size,scale=(0.8,1))
])
The another variable is to write the conversion of the PIL image variable to PyTorch tensor by resizing to the defined image size (target_size) with the specific normalization values.
resize_compose = transforms.Compose([
transforms.Resize((target_size, target_size)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
After creating three variables, you have to put the below code inside the class constructor (or __init__).
self.transform = [
*albu_transformations,
scaling_transformations,
resize_compose
]
After that, you have to put the below wcode into __getitem__ function inside the torch.utils.data.dataset.Dataset class after loading image, and accquiring facial bounding box (defined as x_min, y_min, x_max, and y_max) + head poses (yaw, pitch, and roll).
# To have 50% chance to get equally data augmentation, and another 50% chance to get unaltered image.
augment_or_not = np.random.random_sample()
if augment_or_not < 0.5:
# Equally distributed data augmentation
rand = random.randint(1, 6)
# Flip
if rand == 1:
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
yaw = -yaw
roll = -roll
img = img.transpose(Image.FLIP_LEFT_RIGHT)
# Random Shifting and Cropping
elif rand == 2:
mid_x = int((x_max + x_min) / 2)
mid_y = int((y_max + y_min) / 2)
width = x_max - x_min
height = y_max - y_min
kx = np.random.random_sample() * 0.2 - 0.1
ky = np.random.random_sample() * 0.2 - 0.1
shiftx = mid_x + width * kx
shifty = mid_y + height * ky
x_min = shiftx - width/2
x_max = shiftx + width/2
y_min = shifty - height/2
y_max = shifty + height/2
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
# Rand >= 3 and <= 5 is intensity-based data augmentation
elif rand >= 3 and rand <= 5>:
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
img = np.array(img)
img = self.transform[rand - 3](image = img)['image']
img = Image.fromarray(img)
# Random Scaling
elif rand == 6:
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
img = self.transform[3](img)
else:
# No Data Augmentation
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
# Finalize Transformation
img = self.transform[-1](img)
This approach is similar to the first one.
However, it has difference to let the geometric transformation like Random Shfiting + Cropping, random flipping, and random scaling be the first step, and to let the intensity transformation be the second step.
The code is shown below. You can put inside __getitem__ function inside the torch.utils.data.dataset.Dataset class after loading image, and accquiring facial bounding box (defined as x_min, y_min, x_max, and y_max) + head poses (yaw, pitch, and roll).
# To have 50% chance to get equally data augmentation, and another 50% chance to get unaltered image.
augment_or_not = np.random.random_sample()
if augment_or_not < 0.5:
# Geometric Transformation
rand = random.randint(1, 4)
if rand == 1: # Flip
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
yaw = -yaw
roll = -roll
img = img.transpose(Image.FLIP_LEFT_RIGHT)
elif rand == 2: # Random Shifting
mid_x = int((x_max + x_min) / 2)
mid_y = int((y_max + y_min) / 2)
width = x_max - x_min
height = y_max - y_min
kx = np.random.random_sample() * 0.2 - 0.1
ky = np.random.random_sample() * 0.2 - 0.1
shiftx = mid_x + width * kx
shifty = mid_y + height * ky
x_min = shiftx - width/2
x_max = shiftx + width/2
y_min = shifty - height/2
y_max = shifty + height/2
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
elif rand == 3: # Random Scaling
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
img = self.transform[3](img)
else:
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
# Intensiy-based Augmentation
rand = random.randint(1, 4)
if rand >= 1 and rand <= 3:
img = np.array(img)
img = self.transform[rand-1](image = img)['image']
img = Image.fromarray(img)
else:
img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))