Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Bilinear interpolation in PyTorch, and benchmarking vs. numpy

Here's a simple implementation of bilinear interpolation on tensors using PyTorch.

I wrote this up since I ended up learning a lot about options for interpolation in both the numpy and PyTorch ecosystems. More generally than just interpolation, too, it's also a nice case study in how PyTorch magically can put very numpy-like code on the GPU (and by the way, do autodiff for you too).

For interpolation in PyTorch, this open issue calls for more interpolation features. There is now a nn.functional.grid_sample() feature but at least at first this didn't look like what I needed (but we'll come back to this later).

In particular I wanted to take an image, W x H x C, and sample it many times at different random locations. Note also that this is different than upsampling which exhaustively samples and also doesn't give us flexibility with the precision of sampling.

The implementations: numpy and PyTorch

First let's look at a comparable implementation in numpy which is slightly modified from here.

import numpy as np

def bilinear_interpolate_numpy(im, x, y):
    x0 = np.floor(x).astype(int)
    x1 = x0 + 1
    y0 = np.floor(y).astype(int)
    y1 = y0 + 1

    x0 = np.clip(x0, 0, im.shape[1]-1)
    x1 = np.clip(x1, 0, im.shape[1]-1)
    y0 = np.clip(y0, 0, im.shape[0]-1)
    y1 = np.clip(y1, 0, im.shape[0]-1)

    Ia = im[ y0, x0 ]
    Ib = im[ y1, x0 ]
    Ic = im[ y0, x1 ]
    Id = im[ y1, x1 ]

    wa = (x1-x) * (y1-y)
    wb = (x1-x) * (y-y0)
    wc = (x-x0) * (y1-y)
    wd = (x-x0) * (y-y0)

    return (Ia.T*wa).T + (Ib.T*wb).T + (Ic.T*wc).T + (Id.T*wd).T

And now here I've converted this implementation to PyTorch:

import torch
dtype = torch.cuda.FloatTensor
dtype_long = torch.cuda.LongTensor

def bilinear_interpolate_torch(im, x, y):
    x0 = torch.floor(x).type(dtype_long)
    x1 = x0 + 1
    y0 = torch.floor(y).type(dtype_long)
    y1 = y0 + 1

    x0 = torch.clamp(x0, 0, im.shape[1]-1)
    x1 = torch.clamp(x1, 0, im.shape[1]-1)
    y0 = torch.clamp(y0, 0, im.shape[0]-1)
    y1 = torch.clamp(y1, 0, im.shape[0]-1)
    Ia = im[ y0, x0 ][0]
    Ib = im[ y1, x0 ][0]
    Ic = im[ y0, x1 ][0]
    Id = im[ y1, x1 ][0]
    wa = (x1.type(dtype)-x) * (y1.type(dtype)-y)
    wb = (x1.type(dtype)-x) * (y-y0.type(dtype))
    wc = (x-x0.type(dtype)) * (y1.type(dtype)-y)
    wd = (x-x0.type(dtype)) * (y-y0.type(dtype))

    return torch.t((torch.t(Ia)*wa)) + torch.t(torch.t(Ib)*wb) + torch.t(torch.t(Ic)*wc) + torch.t(torch.t(Id)*wd)

Testing for correctness

Bilinear interpolation is very simple but there are a few things that can be easily messed up.

I did a quick comparison for correctness with SciPy's interp2d.

  • Side note: there are actually a ton of interpolation options in SciPy but none I tested met my critera of (a) doing bilinear interpolation for high-dimensional spaces and (b) efficiently use gridded data. The ones I tested that were built for many dimensions were requiring me to specify sample points for all of those dimensions (and doing trilinear, or other) interpolation. I could get LinearNDInterpolator to do bilinear interpolation for high dimensional vectors but this does not meet criteria (b). There's probably a better option but, at any rate, I gave up and went back to my numpy and PyTorch options.
# Also use scipy to check for correctness
import scipy.interpolate
def bilinear_interpolate_scipy(image, x, y):
    x_indices = np.arange(image.shape[0])
    y_indices = np.arange(image.shape[1])
    interp_func = scipy.interpolate.interp2d(x_indices, y_indices, image, kind='linear')
    return interp_func(x,y)

# Make small sample data that's easy to interpret
image = np.ones((5,5))
image[3,3] = 4
image[3,4] = 3

sample_x, sample_y = np.asarray([3.2]), np.asarray([3.4])

print "numpy result:", bilinear_interpolate_numpy(image, sample_x, sample_y)
print "scipy result:", bilinear_interpolate_scipy(image, sample_x, sample_y)

image = torch.unsqueeze(torch.FloatTensor(image).type(dtype),2)
sample_x = torch.FloatTensor([sample_x]).type(dtype)
sample_y = torch.FloatTensor([sample_y]).type(dtype)

print "torch result:", bilinear_interpolate_torch(image, sample_x, sample_y)

The above gives:

numpy result: [2.68]
scipy result: [2.68]
torch result: 
[torch.cuda.FloatTensor of size 1x1 (GPU 0)]

High dimensional bilinear interpolation

For the correctness test comparing with scipy, we couldn't do W x H x C interpolation for anything but C=1. Now though, we can do bilinear interpolation in either numpy or torch for arbitrary C:

# Do high dimensional bilinear interpolation in numpy and PyTorch
W, H, C = 25, 25, 7
image = np.random.randn(W, H, C)

num_samples = 4
samples_x, samples_y = np.random.rand(num_samples)*(W-1), np.random.rand(num_samples)*(H-1)

print bilinear_interpolate_numpy(image, samples_x, samples_y)

image = torch.from_numpy(image).type(dtype)
samples_x = torch.FloatTensor([samples_x]).type(dtype)
samples_y = torch.FloatTensor([samples_y]).type(dtype)

print bilinear_interpolate_torch(image, samples_x, samples_y)

You'll find that the above numpy and torch versions give the same result.

Bechmarking: numpy (CPU) vs. pytorch (CPU) vs. pytorch (GPU)

Now we do some simple benchmarking:

# Timing comparison for WxHxC (where C is large for a high dimensional descriptor)
W, H, C = 640, 480, 32
image = np.random.randn(W, H, C)

num_samples = 10000
samples_x, samples_y = np.random.rand(num_samples)*(W-1), np.random.rand(num_samples)*(H-1)

import time

start = time.time()
bilinear_interpolate_numpy(image, samples_x, samples_y)
print "numpy took       ", time.time() - start

dtype = torch.FloatTensor
dtype_long = torch.LongTensor
image = torch.FloatTensor(image).type(dtype)
samples_x = torch.FloatTensor([samples_x]).type(dtype)
samples_y = torch.FloatTensor([samples_y]).type(dtype)

start = time.time()
bilinear_interpolate_torch(image, samples_x, samples_y)
print "torch on CPU took", time.time() - start 

dtype = torch.cuda.FloatTensor
dtype_long = torch.cuda.LongTensor
image = image.type(dtype)
samples_x = samples_x.type(dtype)
samples_y = samples_y.type(dtype)

start = time.time()
bilinear_interpolate_torch(image, samples_x, samples_y)
print "torch on GPU took", time.time() - start

On my machine (CPU: 10-core i7-6950X, GPU: GTX 1080) I get the following times (in seconds):

numpy took        0.00756597518921
torch on CPU took 0.12672996521
torch on GPU took 0.000642061233521

Interestingly we have torch on the GPU beating numpy (CPU-only) by about 10x. I'm not sure why torch on the CPU is that slow for this test case. Note that the ratios between these change quite drastically for different W, H, C, num_samples.

Using the available nn.functional.grid_sample()

I ended up figuring out how to use nn.functional.grid_sample() although it was a little odd of a fit for my needs. (Data needs to be in N x C x W x H tensor input, and samples need to be as normalized between [-1,1], and AFAIK the WxH ordering of the samples do not have any meaning other -- they are completely separate samples.)

It was good practice in using permute, multiple unsqueezes, cat.

import torch.nn.functional
dtype = torch.cuda.FloatTensor
dtype_long = torch.cuda.LongTensor

def bilinear_interpolate_torch_gridsample(image, samples_x, samples_y):
                                                # input image is: W x H x C
    image = image.permute(2,0,1)                # change to:      C x W x H
    image = image.unsqueeze(0)                  # change to:  1 x C x W x H
    samples_x = samples_x.unsqueeze(2)
    samples_x = samples_x.unsqueeze(3)
    samples_y = samples_y.unsqueeze(2)
    samples_y = samples_y.unsqueeze(3)
    samples =[samples_x, samples_y],3)
    samples[:,:,:,0] = (samples[:,:,:,0]/(W-1)) # normalize to between  0 and 1
    samples[:,:,:,1] = (samples[:,:,:,1]/(H-1)) # normalize to between  0 and 1
    samples = samples*2-1                       # normalize to between -1 and 1
    return torch.nn.functional.grid_sample(image, samples)

# Correctness test
W, H, C = 5, 5, 1
test_image = torch.ones(W,H,C).type(dtype)
test_image[3,3,:] = 4
test_image[3,4,:] = 3

test_samples_x = torch.FloatTensor([[3.2]]).type(dtype)
test_samples_y = torch.FloatTensor([[3.4]]).type(dtype)

print bilinear_interpolate_torch_gridsample(test_image, test_samples_x, test_samples_y)

# Benchmark
start = time.time()
bilinear_interpolate_torch_gridsample(image, samples_x, samples_y)
print "torch gridsample took ", time.time() - start

My wrapping of grid_sample produces the same bilinear interpolation results and at speeds comparable to our bilinear_interpolate_torch() function:

Variable containing:
(0 ,0 ,.,.) = 
[torch.cuda.FloatTensor of size 1x1x1x1 (GPU 0)]

torch gridsample took  0.000624895095825

Another note about the nn.functional.grid_sample() interface is that it forces the sampled interpolations into a Variable() wrapper even though this seems best left to the programmer to decide.


  • It's surprisingly easy to convert powerful vectorized numpy code into more-powerful vectorized PyTorch code
  • PyTorch is very fast on the GPU
  • Some of the higher-level feature (like nn.function.grid_sample) are nice but so too is writing your own tensor manipulations (and can be comparably fast)

This comment has been minimized.

Copy link

@Erotemic Erotemic commented Dec 20, 2018

Nice write-up; really great work!


This comment has been minimized.

Copy link

@rbrigden rbrigden commented Jan 29, 2019

This is some great work.


This comment has been minimized.

Copy link

@MyChocer MyChocer commented Jun 18, 2019



This comment has been minimized.

Copy link

@stomachacheGE stomachacheGE commented Jul 2, 2019

Thanks for sharing. But I think there is a small issue when you wrote:

W, H, C = 640, 480, 32
image = np.random.randn(W, H, C)

num_samples = 10000
samples_x, samples_y = np.random.rand(num_samples)*(W-1), np.random.rand(num_samples)*(H-1)

I think your implementation of bilinear_interpolate_torch assumes that img obeys numpy's coordinate convention, which is (H,W,C). In fact, if your img input has dimension of (W,H,C), you implementation will clamp the sample_x to [0,H-1]. Although no error raised, but it is not correct.


This comment has been minimized.

Copy link

@Eurus-Holmes Eurus-Holmes commented Aug 6, 2019

How do you implement padding mode?


This comment has been minimized.

Copy link

@XiwuChen XiwuChen commented Sep 12, 2019

Awesome! Thanks for your excellent work.


This comment has been minimized.

Copy link

@KruskalLin KruskalLin commented Nov 19, 2019

Thanks for sharing!


This comment has been minimized.

Copy link

@v-prgmr v-prgmr commented Dec 9, 2019


Ia = im[ y0, x0 ][0]
Ib = im[ y1, x0 ][0]
Ic = im[ y0, x1 ][0]
Id = im[ y1, x1 ][0]

Aren't the above lines in the torch implementation, only taking the intensity values at the (0,0) , (0,1), (1,0) , (1,1) for the first coordinate of samples_x and samples_y ?


This comment has been minimized.

Copy link

@sbarratt sbarratt commented Jul 16, 2020

Some in this thread might be interested in the torch_interpolations package:


This comment has been minimized.

Copy link

@chenerg chenerg commented Nov 20, 2020

dude, this is amazing


This comment has been minimized.

Copy link

@thanhmvu thanhmvu commented Dec 3, 2020

Kudos to this! Very nice write-up indeed.

One minor detail worth pointing out is inbilinear_interpolate_numpy, points at or beyond the boundaries will map to interp(im, w-1, h-1) = 0.0, even though I believe the desired result is interp(im, w-1, h-1) = im[w-1, h-1]. For example, I tried sample_x, sample_y = np.asarray([4]), np.asarray([3]) and got numpy result: [0.] scipy result: [3.]


This comment has been minimized.

Copy link

@OrkhanHI OrkhanHI commented Jan 4, 2021

Thanks for sharing. I have added python code which includes batch size with zero padding and can pass align_corners argument.

This comment has been minimized.

Copy link

@thomasaarholt thomasaarholt commented Jan 12, 2021

Thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment