Instantly share code, notes, and snippets.

# sbarratt/torch_jacobian.py

Created May 9, 2019 19:40
Star You must be signed in to star a gist
Get the jacobian of a vector-valued function that takes batch inputs, in pytorch.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 def get_jacobian(net, x, noutputs): x = x.squeeze() n = x.size() x = x.repeat(noutputs, 1) x.requires_grad_(True) y = net(x) y.backward(torch.eye(noutputs)) return x.grad.data

### a-z-e-r-i-l-a commented May 31, 2019

Great, I wonder if there might be even a faster way. Please let me know if you had any solution about Hessian as well.

### RuohanZhan commented Jun 9, 2019

could you please specify the definition of noutputs? is that the dimension of output vector? Thanks!

### rjeli commented Jun 15, 2019

Still stay awake at night trying to find a faster way to do this... maybe something crazy with hooks so the forward pass only has to be run once, but the backward pass is copied for every output

### gelijergensen commented Jun 27, 2019

@nimaous Here is some code which will compute the jacobian in a differentiable way, which would allow you to get the hessian. I will warn that if your `outputs` tensor is very large, then this may take quite a while

```def jacobian(outputs, inputs, create_graph=False):
"""Computes the jacobian of outputs with respect to inputs

:param outputs: tensor for the output of some function
:param inputs: tensor for the input of some function (probably a vector)
:param create_graph: set True for the resulting jacobian to be differentible
:returns: a tensor of size (outputs.size() + inputs.size()) containing the
jacobian of outputs with respect to inputs
"""
jac = outputs.new_zeros(outputs.size() + inputs.size()
).view((-1,) + inputs.size())
for i, out in enumerate(outputs.view(-1)):
create_graph=create_graph, allow_unused=True)
if col_i is None:
# this element of output doesn't depend on the inputs, so leave gradient 0
continue
else:
jac[i] = col_i

if create_graph:

return jac.view(outputs.size() + inputs.size())```

### yiyuezhuo commented Aug 2, 2019

I try to extend the original one to implement multiple dimention support:

``````def compute_jacobian(f, x, output_dims):
'''
Normal:
f: input_dims -> output_dims
Jacobian mode:
f: output_dims x input_dims -> output_dims x output_dims
'''
repeat_dims = tuple(output_dims) + (1,) * len(x.shape)
jac_x = x.detach().repeat(*repeat_dims)
jac_y = f(jac_x)

ml = torch.meshgrid([torch.arange(dim) for dim in output_dims])
index = [m.flatten() for m in ml]

``````

Usage:

```w = torch.randn(4,3)
f = lambda x: x @ w
x = torch.randn(2,3,4)
jac = compute_jacobian(f, x, [2,3,3])

'''
>>> w
tensor([[-0.2295,  1.4252,  2.2714],
[ 0.5877, -2.4398,  0.0136],
[ 0.3254, -0.3380,  0.1785],
[ 0.5455,  0.9089, -0.3134]])

>>> jac[1,1,1]
tensor([[[ 0.0000,  0.0000,  0.0000,  0.0000],
[ 0.0000,  0.0000,  0.0000,  0.0000],
[ 0.0000,  0.0000,  0.0000,  0.0000]],

[[ 0.0000,  0.0000,  0.0000,  0.0000],
[ 1.4252, -2.4398, -0.3380,  0.9089],
[ 0.0000,  0.0000,  0.0000,  0.0000]]])
'''```

### nimaous commented Aug 2, 2019

Here is my way to calculate Hessian for a batch.

`Continuing the discussion from Pytorch most efficient Jacobian / Hessian calculation:

This is the fastest way I can tell
suppose you have a loss function of x

``````        loss = f(x)
first_drv = torch.zeros(batch_size, x_dim)
hessian = torch.zeros(batch_size, x_dim, x_dim)
for n in range(batch_size):
create_graph=True, retain_graph=True)[n]
for i in range(x_dim):
create_graph=True, retain_graph=True)[n]
``````

### dohmatob commented Oct 19, 2019

Minibatch version of original `get_jacobian` code:

``````def get_jacobian(net, x, num_outputs, batch_size=None, verbose=0):
"""
Compute jacobian matrix of network outputs w.r.t input x.

Parameters
----------
net: A pytorch callable (e.g a network instance)

num_outputs: int
Number of outputs produced by net (per input instance)

batch_size: int, optional
If None, then do run in full-back mode. Else run in minibatch mode
with mini-batches of size `batch_size`
"""
from sklearn.utils import gen_batches
if batch_size is None:
batch_size = num_outputs
num_batches = num_outputs / float(batch_size) + (num_outputs % batch_size != 0)
x = x.squeeze(0)
shape = list(x.shape)
ones =  * len(shape)
jacs = torch.zeros([num_outputs] + shape)

for b, batch in enumerate(gen_batches(num_outputs, batch_size)):
this_batch_size = len(jacs[batch])
output = net(x_)
assert (len(output.shape) == 2 and len(output) == this_batch_size)
output.backward(torch.eye(num_outputs)[batch, :])
if verbose and num_batches > 1:
print("Batch %02i / %02i" % (b + 1, num_batches))

return jacs.data

# Worked example
num_features = 2
num_outputs = 10
x = torch.ones(num_features)
W = torch.randn(num_features, num_outputs)
y = lambda z: z @ W + 2019
batch_size = 3
jacs = get_jacobian(y, x, num_outputs, batch_size, verbose=1)
print("dy/dx:\n%s" % jacs)
print("W.T:\n%s" % W.T)
``````

### tsauri commented Nov 4, 2019 • edited

I need to get jacobian for weight (nn.Parameter of a nn.Module).
Is there way for weight.grad to 'expand' according to number of dim of y

### RylanSchaeffer commented Dec 2, 2019

Can you clarify what the appropriate input shapes are? I'm currently trying to use the gist, but on line `x = x.repeat(noutputs, 1)`, I receive the following error:

`RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor`

Genius!

### RylanSchaeffer commented Dec 4, 2019

where is `n` used?

### harpone commented Dec 14, 2019 • edited

Here's a differentiable batch version that doesn't loop over the batch dim (there's still one for loop left over the input dimension though... not sure if it's possible to get rid of that):

``````def jacobian(f, x):
"""Computes the Jacobian of f w.r.t x.

This is according to the reverse mode autodiff rule,

sum_i v^b_i dy^b_i / dx^b_j = sum_i x^b_j R_ji v^b_i,

where:
- b is the batch index from 0 to B - 1
- i, j are the vector indices from 0 to N-1
- v^b_i is a "test vector", which is set to 1 column-wise to obtain the correct
column vectors out ot the above expression.

:param f: function R^N -> R^N
:param x: torch.tensor of shape [B, N]
:return: Jacobian matrix (torch.tensor) of shape [B, N, N]
"""

B, N = x.shape
y = f(x)
jacobian = list()
for i in range(N):
v = torch.zeros_like(y)
v[:, i] = 1.
x,
retain_graph=True,
create_graph=True,
allow_unused=True)  # shape [B, N]
jacobian.append(dy_i_dx)

return jacobian
``````

EDIT: `x` needs to have `requires_grad=True`.

### MasanoriYamada commented Dec 27, 2019 • edited

get_jacobian input x has batch case without loop.

``````import torch

def get_batch_jacobian(net, x, noutputs):
x = x.unsqueeze(1) # b, 1 ,in_dim
n = x.size()
x = x.repeat(1, noutputs, 1) # b, out_dim, in_dim
y = net(x)
input_val = torch.eye(noutputs).reshape(1,noutputs, noutputs).repeat(n, 1, 1)
y.backward(input_val)
``````

usage

``````class Net(torch.nn.Module):
def __init__(self, in_dim, out_dim):
super(Net, self).__init__()
self.fc1 = torch.nn.Linear(in_dim, out_dim)

def forward(self, x):

batch = 2
num_features = 3
num_outputs = 5
x = torch.randn(batch, num_features)
print(x.shape)
net = Net(num_features, num_outputs)
print(net(x).shape)
result = get_batch_jacobian(net, x, num_outputs)
print(result.shape)
``````

Here my code tensor shape

input x: torch.Size([2, 3])
output y: torch.Size([2, 5])
jacobian dy/dx: torch.Size([2, 5, 3])

ret0 = get_batch_jacobian(net, x, num_outputs) # my code
ret1 = get_jacobian(net, x, num_outputs) # sbarratt code
ret2 = get_jacobian(net, x, num_outputs) # sbarratt code

ret0 == ret1
ret0 == ret2

### RylanSchaeffer commented Jan 12, 2020

@MasanoriYamada , can you explain how this works? What does `torch.eye(noutputs).reshape(1,noutputs, noutputs).repeat(n, 1, 1)` do?

### RylanSchaeffer commented Jan 12, 2020

@MasanoriYamada , how can I make this work when my inputs are 4d Tensors? When I try calling the model's forward method, I receive the following error:

`RuntimeError: Expected 4-dimensional input for 4-dimensional weight 6 3 5 5, but got 5-dimensional input of size [batch_size, output_dimension, num_channels, image_width, image_height] instead`

### MasanoriYamada commented Jan 16, 2020 • edited

@RylanSchaeffer, My code only supports flatten tensor with batch. Please, show me that input-output tensor and network in your case (simple network is the best)

### MasanoriYamada commented Jan 16, 2020

Disclaimer: Value not tested for correctness

### RylanSchaeffer commented Jan 20, 2020

Question for anyone: why do we need to tile the input before passing it through the graph (`net`, in sbarratt's original code)? Why can't we tile the input and the output after the forward pass?

### RylanSchaeffer commented Jan 20, 2020 • edited

I'm trying to do this currently, but I'm receiving the error: `One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.`

Here's what I'm doing. Let x be the input to the graph with shape `(batch size, input dimension)` and let y be the output of the graph with shape `(batch size, output dimension)`. I then select a subset of `N` random unit vectors. I stack x with itself and y with itself as follows:

`x = torch.cat([x for _ in range(N)], dim=0)`

and

`y = torch.cat([y for _ in range(N)], dim=0)`

x then has shape `(N * batch size, input dim)` and y has shape `(N * batch size, output dim)`. But then, when I try to use autograd, I receive the aforementioned error .

``````        jacobian = torch.autograd.grad(
outputs=y,
inputs=y,
retain_graph=True,
only_inputs=True)

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

``````

Does anyone know why this is, and is there a way to make this post-forward pass tiling work?

### Jeff1995 commented Apr 29, 2020

@RylanSchaeffer
I was trying the same thing with `unsqueeze().expand()`, but it leads to the same autograd error. I suppose it's because the newly created x and y nodes are just hanging in the computation graph, and do not really have a dependency, so autograd would no longer work.

### ChenAo-Phys commented May 4, 2020

I met this page about 1 year ago. This is really a nice trick, but it's a pity that it needs to forward pass a large batch and becomes a huge challenge to my GPU room. Recently I found an interesting way to bypass this problem. It's really interesting to solve a problem I encountered 1 year ago. https://github.com/ChenAo-Phys/pytorch-Jacobian

### justinblaber commented May 31, 2020 • edited

If I'm understanding this correctly, this code will forward pass `noutputs` times just to compute the jacobian once (but do it in a vectorized way)... The `1.5.0` autograd jacobian computation seems to compute the output once but then forloops over it and call `backward` one by one (@rjeli first comment) which will for sure be slow... Both tradeoffs seem sub optimal.

Anyone know if there's an update on this? Or is pytorch really not meant to compute jacobians?

### RylanSchaeffer commented Jun 1, 2020

@justinblaber , autodiff either computes matrix-vector products or vector-matrix products (depending on forward mode / reverse mode). The Jacobian is a matrix - there's no easy way to recover this by itself. Either you perform multiple backwards passes, using different elementary basis vector on each pass, or you blow the batch size up and do one massive backwards pass. There's no way around this.

is it good?

### justinblaber commented Jun 1, 2020

``````for j in range(out.nelement()):
And actually, to be honest I wanted the jacobian earlier to do some gauss newton type optimization, but I've actually since discovered that the `optim.LBFGS` optimizer (now built into pytorch) might work well for my problem. I think it even has some backtracking type stuff built into it. So for now I don't think I even need the jacobian anymore.