Skip to content

Instantly share code, notes, and snippets.

@ernestum
Forked from fmder/elastic_transform.py
Last active November 2, 2023 10:26
Show Gist options
  • Star 47 You must be signed in to star a gist
  • Fork 11 You must be signed in to fork a gist
  • Save ernestum/601cdf56d2b424757de5 to your computer and use it in GitHub Desktop.
Save ernestum/601cdf56d2b424757de5 to your computer and use it in GitHub Desktop.
Elastic transformation of an image in Python
import numpy as np
from scipy.ndimage.interpolation import map_coordinates
from scipy.ndimage.filters import gaussian_filter
def elastic_transform(image, alpha, sigma, random_state=None):
"""Elastic deformation of images as described in [Simard2003]_.
.. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
Convolutional Neural Networks applied to Visual Document Analysis", in
Proc. of the International Conference on Document Analysis and
Recognition, 2003.
"""
if random_state is None:
random_state = np.random.RandomState(None)
shape = image.shape
dx = gaussian_filter((random_state.rand(*shape) * 2 - 1), sigma, mode="constant", cval=0) * alpha
dy = gaussian_filter((random_state.rand(*shape) * 2 - 1), sigma, mode="constant", cval=0) * alpha
dz = np.zeros_like(dx)
x, y, z = np.meshgrid(np.arange(shape[0]), np.arange(shape[1]), np.arange(shape[2]))
print x.shape
indices = np.reshape(y+dy, (-1, 1)), np.reshape(x+dx, (-1, 1)), np.reshape(z, (-1, 1))
distored_image = map_coordinates(image, indices, order=1, mode='reflect')
return distored_image.reshape(image.shape)
@ernestum
Copy link
Author

This version also supports color images (3 RGB channels).

@jdelange
Copy link

jdelange commented Sep 2, 2016

Nice! What are good values for alpha and sigma? I assume the alpha from the original paper (a=8) cannot be directly translated to this implementation?

@mamrehn
Copy link

mamrehn commented Nov 24, 2016

Thanks!
A note: dz defined in line L18 is actually never used (in L22).
For RGB images it makes sense not to mix channels. In that case, you can just delete the line with dz = np.zeros_like(dx).

@lgy1425
Copy link

lgy1425 commented Apr 17, 2017

Thank you. But should Input Image be square?

@iliya-hajjar
Copy link

How can I save this distored image ?
I tried with PIL and scipy but the output is entirely black , actually nothing.
I can't even show the image with matplot. (TypeError: Invalid dimensions for image data)
The error is clear but how can I create the appropriate shape for images?

@l770943527
Copy link

l770943527 commented Feb 26, 2018

Hi, I have load a RGB img whose shape is (400, 248, 3), but I have got an error

ValueError: operands could not be broadcast together with shapes (248,400,3) (400,248,3)
in the code here
indices = np.reshape(y + dy, (-1, 1)), np.reshape(x + dx, (-1, 1)), np.reshape(z, (-1, 1))

can anyone help ? THX!!!

@iver56
Copy link

iver56 commented Oct 15, 2018

Nice! What are good values for alpha and sigma?

Input Example output (with alpha=991, sigma=8)
checkers transformed_checkers_991_8

@bigfred76
Copy link

Hi, I have load a RGB img whose shape is (400, 248, 3), but I have got an error

ValueError: operands could not be broadcast together with shapes (248,400,3) (400,248,3)
in the code here
indices = np.reshape(y + dy, (-1, 1)), np.reshape(x + dx, (-1, 1)), np.reshape(z, (-1, 1))

can anyone help ? THX!!!

you need to invert the shapes in the resolution of x,y,z :
x, y, z = np.meshgrid(np.arange(shape[1]), np.arange(shape[0]), np.arange(shape[2]))
instead of
x, y, z = np.meshgrid(np.arange(shape[0]), np.arange(shape[1]), np.arange(shape[2]))

@bigfred76
Copy link

bigfred76 commented Jan 17, 2019

with the correction I gave in the previous post, the algo works perfectly ! Thanks
Results on the dataset of OCR digits recognition I'm currently building :
image

For the question of sigma alpha values, I build the dataset with 3 pairs of values as follows :
ELASTIC_ALPHA_SIGMA = ((1201, 10), (1501, 12), (991, 8))

@Rsalganik1123
Copy link

Hello I get an error: tuple index out of range
on line : x, y, z = np.meshgrid(np.arange(shape[0]), np.arange(shape[1]), np.arange(shape[2]))
Anyone have any advice?

@noorulhasan06
Copy link

try to print the shape and you will find out that the shape is something like [x,y] not [x,y,z].
this can be because you may using grayscale image. try to reshape the image to [x,y,1].

@myriam23
Copy link

hi!
How do you apply the same transformation to the mask ? I have a set of satellite images with each their corresponding road segmentation (black and white, (400,400))

@nianxiongdi
Copy link

How to find the corresponding mathematical formula?

@yliu7366
Copy link

hi!
How do you apply the same transformation to the mask ? I have a set of satellite images with each their corresponding road segmentation (black and white, (400,400))

Same question here, the implementation is inherently randomized for each run, making it impossible to apply the exactly same transform to both the original image and the mask image.

@frederikfaye
Copy link

hi!
How do you apply the same transformation to the mask ? I have a set of satellite images with each their corresponding road segmentation (black and white, (400,400))

Same question here, the implementation is inherently randomized for each run, making it impossible to apply the exactly same transform to both the original image and the mask image.

Just fix the random_state for both calls. :)

@anamikajnu
Copy link

Can I apply it for a multi-class dataset for the segmentation task?

@roryw10
Copy link

roryw10 commented Mar 1, 2020

Can you elaborate a bit more @jepperaskdk - i would love to make use of this however im unsure what you mean in terms of stack the two images (i..e pull in the original from one folder X_img and the corresponding mask Y-Img and separate after processing? Literally run them all through and then manually / script separate the images that are the output?

@jepperaskdk
Copy link

Can you elaborate a bit more @jepperaskdk - i would love to make use of this however im unsure what you mean in terms of stack the two images (i..e pull in the original from one folder X_img and the corresponding mask Y-Img and separate after processing? Literally run them all through and then manually / script separate the images that are the output?

On second thought, I'm not sure if it works.

@ellisdg
Copy link

ellisdg commented Jun 29, 2020

Hi, I have load a RGB img whose shape is (400, 248, 3), but I have got an error
ValueError: operands could not be broadcast together with shapes (248,400,3) (400,248,3)
in the code here
indices = np.reshape(y + dy, (-1, 1)), np.reshape(x + dx, (-1, 1)), np.reshape(z, (-1, 1))
can anyone help ? THX!!!

you need to invert the shapes in the resolution of x,y,z :
x, y, z = np.meshgrid(np.arange(shape[1]), np.arange(shape[0]), np.arange(shape[2]))
instead of
x, y, z = np.meshgrid(np.arange(shape[0]), np.arange(shape[1]), np.arange(shape[2]))

Inverting the shapes flipped the image for me. Setting the the indexing of the meshgrid to 'ij' instead fixed this issue:
x, y, z = np.meshgrid(np.arange(shape[0]), np.arange(shape[1]), np.arange(shape[2]), indexing='ij')

@mvoelk
Copy link

mvoelk commented Jun 30, 2020

Since the interpolation is also done over the channel dimension, I got this interpolation artifacts.
interpolation_artifacts
So I decided to do the interpolation channel-wise.
https://gist.github.com/mvoelk/0880f5de7c101c093165e1e46ce3f6e5

@phamthephuc
Copy link

image
which augement did you apply for dataset @bigfred76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment