Skip to content

Instantly share code, notes, and snippets.

@TianyiFranklinWang
Created May 17, 2023 06:43
Show Gist options
  • Save TianyiFranklinWang/ada7591b367455c97bdbeeccb58eecf8 to your computer and use it in GitHub Desktop.
Save TianyiFranklinWang/ada7591b367455c97bdbeeccb58eecf8 to your computer and use it in GitHub Desktop.

Reshape + Transpose Trick by lafoss Explained Hi all , I would first like to thanks Kaggle and organizers for this lovely competition , although I was really shocked to see just 13 files amongst test and train when I just entered the competition . After two days of Studying the competition , now its fairly clear of what is going on .

Thanks to @iafoss and his trick , I learned how to tile the big tiff files into something trainable . Its a fairly simple and elegant trick but at first sight might look overwhelming and complicated , Since I know how difficult it can be deciphering it , I just thought to share the explanation for making it easy to others . I will walk you through the code one by one

Step 1 : Reading the big tiff file

img = tiff.imread(os.path.join(DATA,index+'.tiff'))

Step 2 : Checking if the Image is having three dimensions , if not squeeze the Image and rearrange axeses to get it to (H,W,C)

if len(img.shape) == 5:
  img = np.transpose(img.squeeze(), (1,2,0))
  mask = enc2mask(encs,(img.shape[1],img.shape[0]))

Step 3 : Add padding on either sides to make the Image divisible into the given tiles properly . Its simple maths . If the height of Image is H (shape[0]) and we want the tile size (reducesz) to completely divide it , then what number should be substracted from H , its simple right (shape[0]%(reducesz)) and thus after subtracting this we can get pad value for Height . Similarly we can find pad value for width as well

shape = img.shape 
pad0 = (reduce*sz - shape[0]%(reduce*sz))%(reduce*sz) 
pad1 = (reduce*sz - shape[1]%(reduce*sz))%(reduce*sz)

After that we can use numpy to fill values of zero on either side of the Image

img = np.pad(img,[[pad0//2,pad0-pad0//2],[pad1//2,pad1-pad1//2],[0,0]], constant_values=0) 
mask = np.pad(mask,[[pad0//2,pad0-pad0//2],[pad1//2,pad1-pad1//2]], constant_values=0)

Step 4 : Resizing the Image to reduce it by 4(reduce) times

img = cv2.resize(img,(img.shape[1]//reduce,img.shape[0]//reduce), interpolation = cv2.INTER_AREA)

Step 5 : Now comes the tricky part which will need some explanation

So the trick is based on how numpy arrays are stored in memory . Every numpy array be it any dimension for us is stored as a 1D array internally , meaning when we define a numpy array of shape let's say (512,512,3) , internally a contiguous memory location is alloted to this array and the are arranged as a sequence of numbers laid down as [ 1,2,3……512,1,2,3….512,1,2,3 ] and what we see on our end is just different views of the same 1D array . Now if we reshape the 3D array we just split the 1D array at a different place and get a different view . Thus if we reshape the original Image (H,W,C) into (number of tiles , H,W,C) our job is done right? That is the trick , the only thing we need to care about is we split at the right place .

Here is how lafoss explains the trick : To better understand what is going on, you should recall how the tensor is placed in the memory: everything is represented by a 1d array (without going into other complications), and the dimensionality is just an additional description for interpretation of this 1d array. So, for example let's consider a split of 1d array into tiles. The original array [0,1,2,3,4,5,6,7,8,9] after reshaping to (2,5) would look like [0,1,2,3,4|5,6,7,8,9]. Nothing happened with the way how the data is stored, you just put a separator showing that right now you interpret the data as a (2,5) shape array. Similar things are done for the images: u add separators to divide the image into tiles. The only problem right now is that the individual tiles are not contiguous in the memory. Therefore, you need to permute the dimensions, merge the dimensions for x and y indexes of tiles, and make everything contiguous. The last two steps are performed with reshape, and finally u have a tensor containing a list of tiles.

How is the splitting done ? Now the 1D array might be arrange sequentially as H,W,C We break the height as H/tile_size(sz) and width as W/tile_size(sz) and thus the array is rehsaped to

(H/sz,sz,W/sz,sz,3)
img = img.reshape(img.shape[0]//sz,sz,img.shape[1]//sz,sz,3)

Now we have created the tiles but they are contiguous in memory (that is not in right sequence) and thus we transpose the image to get

(H/sz,W/sz,sz,sz,3)
img = img.transpose(0,2,1,3,4)

Now all we need to do is to merge axis 0 and axis 1 to get to our tiles

(number of tiles , sz,sz,3)
Number of tiles = (H/sz) * (W/sz)
img = img.reshape(-1,sz,sz,3)

I hope this explanation helps people to understand it easily and allows the code to be used to any custom Image data they want in future

Thanks for reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment