Skip to content

Instantly share code, notes, and snippets.

@apaszke
Last active April 3, 2024 03:40
Show Gist options
  • Star 72 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save apaszke/226abdf867c4e9d6698bd198f3b45fb7 to your computer and use it in GitHub Desktop.
Save apaszke/226abdf867c4e9d6698bd198f3b45fb7 to your computer and use it in GitHub Desktop.
import torch
def jacobian(y, x, create_graph=False):
jac = []
flat_y = y.reshape(-1)
grad_y = torch.zeros_like(flat_y)
for i in range(len(flat_y)):
grad_y[i] = 1.
grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
jac.append(grad_x.reshape(x.shape))
grad_y[i] = 0.
return torch.stack(jac).reshape(y.shape + x.shape)
def hessian(y, x):
return jacobian(jacobian(y, x, create_graph=True), x)
def f(x):
return x * x * torch.arange(4, dtype=torch.float)
x = torch.ones(4, requires_grad=True)
print(jacobian(f(x), x))
print(hessian(f(x), x))
@KhalilElkhalil
Copy link

Dear Adam,

Is there a way to compute the Laplacian of a function f w.r.t a tensor x with dimension bxD (b: batch size, D: data dimension)? We need to compute $\sum_{i=1}^D \partial^2 f(x) / \partial x_i^2$ in an efficient way. Computing the Hessian and taking the trace seems to compute unnecessary off-diagonals which are irrelevant to the Laplacian.

Thanks a lot!

@apaszke
Copy link
Author

apaszke commented Jan 25, 2020

I don't think there's any other way with the current AD methods. You don't have to keep the whole Hessian in memory of course (you can throw away a row of the Hessian once you've picked out the element you're interested in), but you'll still need to compute each row, just like the hessian function does.

@KhalilElkhalil
Copy link

Thanks Adam!

@slerman12
Copy link

Is there a reason why you use grad_y instead of just indexing flat_y[i] in the autograd?

@jalane76
Copy link

Hello, I am relatively new to PyTorch and came across your Hessian function. It is much more elegant than some Hessian code from an academic paper that I am trying to reproduce. I've put together a toy example, but keep getting the error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I've been scouring the docs and googling, but for the life of me I can't figure out what I'm doing wrong. Any help you could offer would be greatly appreciated!

Here is my code:

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

torch.set_printoptions(precision=20, linewidth=180)

def jacobian(y, x, create_graph=False):
    jac = []                             
    flat_y = y.reshape(-1)     
    grad_y = torch.zeros_like(flat_y)
    for i in range(len(flat_y)):         
        grad_y[i] = 1.
        grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
        jac.append(grad_x.reshape(x.shape))
        grad_y[i] = 0.
    return torch.stack(jac).reshape(y.shape + x.shape)           
                                                                                                      
def hessian(y, x):  
    return jacobian(jacobian(y, x, create_graph=True), x)                                             
                                                                                                      
def f(x):                                                                                             
    return x * x                                            

np.random.seed(435537698)

num_dims = 2
num_samples = 3

X = [np.random.uniform(size=num_dims) for i in range(num_samples)]

mean = torch.Tensor(np.mean(X, axis=0))
mean.requires_grad = True

cov = torch.Tensor(np.cov(X, rowvar=False))

with autograd.detect_anomaly():
    hessian_matrices = hessian(f(mean), mean)
    print('hessian: \n{}\n\n'.format(hessian_matrices))

The output with anomaly detection turned on is here:


RuntimeError Traceback (most recent call last)
in ()
67
68 with autograd.detect_anomaly():
---> 69 hessian_matrices = hessian(f(mean), mean)
70 print('hessian: \n{}\n\n'.format(hessian_matrices))

2 frames
in hessian(y, x)
45 print('--> hessian()')
46 j = jacobian(y, x, create_graph=True)
---> 47 return jacobian(j, x)
48
49 def f(x):

in jacobian(y, x, create_graph)
28 print('\tgrad_y: \n\t{}\n'.format(grad_y))
29
---> 30 grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
31 print('\tgrad_x: \n\t{}\n')
32

/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
155 return Variable._execution_engine.run_backward(
156 outputs, grad_outputs, retain_graph, create_graph,
--> 157 inputs, allow_unused)
158
159

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Finally, I'm running my code in a Google Colab notebook with PyTorch 1.4 if that makes a difference.

Thanks!

@jalane76
Copy link

I did manage to get the code to run now. I made a "simplification" that broke it.

Your function f is:

def f(x):                                                                                             
    return x * x * torch.arange(4, dtype=torch.float)  

While mine was:

def f(x):                                                                                             
    return x * x  

I've since fixed it to:

def f(x):                                                                                             
    return x * x  * torch.ones_like(x)

and it works like a charm. @apaszke any idea why that is the case?

@el-hult
Copy link

el-hult commented Apr 20, 2020

I did manage to get the code to run now. I made a "simplification" that broke it.

Your function f is:

def f(x):                                                                                             
    return x * x * torch.arange(4, dtype=torch.float)  

While mine was:

def f(x):                                                                                             
    return x * x  

I've since fixed it to:

def f(x):                                                                                             
    return x * x  * torch.ones_like(x)

and it works like a charm. @apaszke any idea why that is the case?

you can switch torch.ones_like(x) to 1 and it still works...

@Ronnypetson
Copy link

Hello Adam! How could I give credit to you if I use this code? Can it be a doc-string in documentation, paper citation or something?

@guanshaoheng
Copy link

guanshaoheng commented Apr 17, 2021

Now the function torch.autograd.functional.jacobian can do the same thing, I think.


def jacobian(y, x, create_graph=False):
    # xx, yy = x.detach().numpy(), y.detach().numpy()
    jac = []
    flat_y = y.reshape(-1)
    grad_y = torch.zeros_like(flat_y)
    for i in range(len(flat_y)):
        grad_y[i] = 1.
        grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=True)
        jac.append(grad_x.reshape(x.shape))
        grad_y[i] = 0.
    return torch.stack(jac).reshape(y.shape + x.shape)


def hessian(y, x):
    return jacobian(jacobian(y, x, create_graph=True), x)


def f(xx):
    # y = x * x * torch.arange(4, dtype=torch.float)
    matrix = torch.tensor([[0.2618, 0.2033, 0.7280, 0.8618],
        [0.1299, 0.6498, 0.6675, 0.0527],
        [0.3006, 0.9691, 0.0824, 0.8513],
        [0.7914, 0.2796, 0.3717, 0.9483]], requires_grad=True)
    y = torch.einsum('ji, i -> j', (matrix, xx))
    return y


if __name__ == "__main__":
    # matrix = torch.rand(4, 4, requires_grad=True)
    # print(matrix)
    x = torch.arange(4,  dtype=torch.float, requires_grad=True)
    print(jacobian(f(x), x))
    grad = torch.autograd.functional.jacobian(f, x).numpy()
    # grad = grad.flatten()
    print(grad)
    # print(hessian(f(x, matrix), x))

output

        [0.1299, 0.6498, 0.6675, 0.0527],
        [0.3006, 0.9691, 0.0824, 0.8513],
        [0.7914, 0.2796, 0.3717, 0.9483]], grad_fn=<ViewBackward>)
[[0.2618 0.2033 0.728  0.8618]
 [0.1299 0.6498 0.6675 0.0527]
 [0.3006 0.9691 0.0824 0.8513]
 [0.7914 0.2796 0.3717 0.9483]]```

@AjinkyaBankar
Copy link

Hi,
I want to find a Hessian matrix for the loss function of the pre-trained neural network with respect to the parameters of the network. How can I use this method? Can someone please share an example? Thanks.

@maryamaliakbari
Copy link

Hi,
I want to find a Hessian matrix for the loss function of the pre-trained neural network with respect to the parameters of the network. How can I use this method? Can someone please share an example? Thanks.

Hi,
I am looking for the same thing. Could you figure out how we can do it?

@mil-ad
Copy link

mil-ad commented Oct 25, 2021

I think this has now been added to recent versions of torch's autograd module. Maybe look at the examples here

@maryamaliakbari
Copy link

I think this has now been added to recent versions of torch's autograd module. Maybe look at the examples here

Right. I checked it. When I use this method I am getting multiple errors. I am looking for an example or similar code to see how the implementation is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment