Skip to content

Instantly share code, notes, and snippets.

Created September 9, 2023 17:53
Show Gist options
  • Save nickuntitled/2e4bb2c57633a9a3ca8bdb1450cf72d6 to your computer and use it in GitHub Desktop.
Save nickuntitled/2e4bb2c57633a9a3ca8bdb1450cf72d6 to your computer and use it in GitHub Desktop.
This file contains the approaches to modify data augmentation processes.

Modification of Data Augmentation to be one step or two steps

Data augmentation is the process to increase the diversity of dataset by using synthetic data generation. Normally, when using data augmentation, we have to sequentially augment the image from the datasets. However, this method gives the unaltered image with one figure chances.

From the above reason, there is a modification of data augmentation into two methods.

One step data augmentation

This approach is to modify the data augmentation processes (Figure 1) to have

  1. 50% chance to get the unmodified image
  2. the rest 50% chance to get equal chances to get data augmentation.

Figure 1)

Inside the rest 50%, the chances to get each data augmentation is equally distributed according to Figure 2.

Figure 2

The code to make one step data augmentation is shown below.

The first part is to import the necessary libraries.

import albumentations
from torchvision import transforms
import numpy as np
from PIL import Image

The next part is to write the transformation variables. It is separated in three variables.

The first variable is to write the image intensity modification data augmentation by using albumentations.

albu_transformations = [A.GaussianBlur(p=1.0),
albu_transformations = [A.Compose([x]) for x in albu_transformations]

The second variable is to write the geometric transformation for random scaling (target_size is the desired image size).

scaling_transformations = transforms.Compose([
    transforms.Resize((target_size, target_size)), 

The another variable is to write the conversion of the PIL image variable to PyTorch tensor by resizing to the defined image size (target_size) with the specific normalization values.

resize_compose = transforms.Compose([
    transforms.Resize((target_size, target_size)),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

After creating three variables, you have to put the below code inside the class constructor (or __init__).

self.transform = [

After that, you have to put the below wcode into __getitem__ function inside the class after loading image, and accquiring facial bounding box (defined as x_min, y_min, x_max, and y_max) + head poses (yaw, pitch, and roll).

# To have 50% chance to get equally data augmentation, and another 50% chance to get unaltered image.
augment_or_not = np.random.random_sample()
if augment_or_not < 0.5:
    # Equally distributed data augmentation
    rand = random.randint(1, 6)

    # Flip
    if rand == 1: 
        img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
        yaw = -yaw
        roll = -roll
        img = img.transpose(Image.FLIP_LEFT_RIGHT)

    # Random Shifting and Cropping
    elif rand == 2:
        mid_x = int((x_max + x_min) / 2)
        mid_y = int((y_max + y_min) / 2)
        width = x_max - x_min
        height = y_max - y_min
        kx = np.random.random_sample() * 0.2 - 0.1
        ky = np.random.random_sample() * 0.2 - 0.1
        shiftx = mid_x + width * kx
        shifty = mid_y + height * ky
        x_min = shiftx - width/2
        x_max = shiftx + width/2
        y_min = shifty - height/2
        y_max = shifty + height/2
        img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))

    # Rand >= 3 and <= 5 is intensity-based data augmentation
    elif rand >= 3 and rand <= 5>:
        img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
        img = np.array(img)
        img = self.transform[rand - 3](image = img)['image']
        img = Image.fromarray(img)

    # Random Scaling
    elif rand == 6:
        img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
        img = self.transform[3](img)
    # No Data Augmentation
    img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))

# Finalize Transformation
img = self.transform[-1](img)

Two step data augmentation

This approach is similar to the first one.

However, it has difference to let the geometric transformation like Random Shfiting + Cropping, random flipping, and random scaling be the first step, and to let the intensity transformation be the second step.

The code is shown below. You can put inside __getitem__ function inside the class after loading image, and accquiring facial bounding box (defined as x_min, y_min, x_max, and y_max) + head poses (yaw, pitch, and roll).

# To have 50% chance to get equally data augmentation, and another 50% chance to get unaltered image.
augment_or_not = np.random.random_sample()

if augment_or_not < 0.5:
    # Geometric Transformation
    rand = random.randint(1, 4)
    if rand == 1: # Flip
        img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
        yaw = -yaw
        roll = -roll
        img = img.transpose(Image.FLIP_LEFT_RIGHT)
    elif rand == 2: # Random Shifting
        mid_x = int((x_max + x_min) / 2)
        mid_y = int((y_max + y_min) / 2)
        width = x_max - x_min
        height = y_max - y_min
        kx = np.random.random_sample() * 0.2 - 0.1
        ky = np.random.random_sample() * 0.2 - 0.1
        shiftx = mid_x + width * kx
        shifty = mid_y + height * ky
        x_min = shiftx - width/2
        x_max = shiftx + width/2
        y_min = shifty - height/2
        y_max = shifty + height/2
        img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
    elif rand == 3: # Random Scaling
        img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
        img = self.transform[3](img)
        img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))

    # Intensiy-based Augmentation
    rand = random.randint(1, 4)
    if rand >= 1 and rand <= 3:
        img = np.array(img)
        img = self.transform[rand-1](image = img)['image']
        img = Image.fromarray(img)
    img = img.crop((int(x_min), int(y_min), int(x_max), int(y_max)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment