Student: Baidyanath Kundu
Mentors: Vassil Vassilev, Ioana Ifrim
Organisation: CERN-HSF
I want to say a huge thanks to my mentors Vassil Vassilev and Ioana Ifrim for guiding we throughout the project. They were better mentors than I had hoped for. I am also thankful to my fellow students Garima Singh and Parth Arora for pointing out some issues in my code and thus help me make it better. Last but not the least I would like to thank the members of Cpplang slack for helping me improve my knowledge on template meta programming.
ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations. ROOT has a clang-based C++ interpreter Cling and integrates with Clad to enable flexible automatic differentiation facility.
TFormula is a ROOT class which bridges compiled and interpreted code.
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is an alternative technique to Symbolic differentiation and Numerical differentiation (the method of finite differences). Clad is an AD tool based on Clang which provides the necessary facilities for code transformation.
The project aims to add second order derivative support in TFormula using clad::hessian. The PR that added support for gradients in ROOT is taken as a reference and can be accessed here.
Optionally, if time permits, this project will also convert the gradient function into CUDA device kernels.
The original objectives of the project were to enable clad::hessian
support in ROOT:
- Add array differentiation support for arrays named
p
in forward mode - Add array differentiation support for arrays named
p
in hessian mode - Add hessian mode support to ROOT
- (Optional) Add CUDA gradient support to ROOT
However, clad needed support for array differentiation, and it is a natural extension of this project, so I took it on. As a result, the amended objectives were:
- Add array differentiation support in forward mode
- Add array differentiation support in reverse mode
- Add array differentiation support in hessian mode
- Update clad interface in ROOT
- Add support for hessian mode in ROOT
- (Optional) Add CUDA gradient support to ROOT
The major PRs I contributed in the GSoC period are mentioned here. All of them have been merged.
PR | Description |
---|---|
#244 | Add array differentiation support in forward mode |
#237 | Add array differentiation support in reverse and hessian mode |
#284 | Add changes necessary to integrate clad into root |
#286 | Add support for llvm 13 |
#291 | Add tests for ROOT interface |
To see all the PRs contributed to clad by me checkout this link.
PR | Description |
---|---|
#8848 | Bump clad version to v0.9. |
#8371 | Initial support of hessian calculation in TFormula using clad |
Users can now utilise forward mode to derive functions with respect to a specific array element:
double f1(double *arr) {
return arr[0] * arr[0] + arr[1];
}
// Differentiate f w.r.t. arr[0]
auto f1_dx = clad::differentiate(f1, "arr[0]");
// Produces:
// double f_darg0_0(double *arr) {
// return 1 * arr[0] + arr[0] * 1 + 0;
// }
double arr[2] = {3, 4};
// Execute the generated derivative
double darr0 = f1_dx.execute(arr);
// Results in: darr0 = 6
The basic concept was that the array components may be thought of as separate variables grouped together. Forward mode
differentiates the source function with respect to a single variable, we just needed to activate differentiation of the
requested element (by multiplying by 1
) and prevent differentiation of the rest (by multiplying them with 0
).
As an example consider the source function to be:
double f2(double* arr) {
return arr[0] * arr[1];
}
The derivative produced with respect to arr[1]
is:
double f2_darg0_1(double *arr) {
// Differentiation with respect to arr[1] is arr[0] and it is enabled by multiplying with 1
// Differentiation with respect to arr[0] is arr[1] and it is disabled by multiplying with 0
return 0 * arr[1] + arr[0] * 1;
}
Since this was very close to the original objective of adding array differentiation support for arrays named p
it went
mostly according to the proposed plan. A slight challenge that I faced was differentiating array subscript expressions
which had expressions as their subscript. To fix this we check if subscript expression is equal to the index that the
user has requested and set the dx variable accordingly.
So for a function like this:
double f3(double* arr) {
int x = 1;
return arr[x]; // The subscript used is an expression ("x")
}
The derivative w.r.t. arr[1]
produced would be:
clad::differentiate(f3, "arr[1]"); // To differentiate f w.r.t. to arr[1]
double f3_darg0_1(double *arr) {
int _d_x = 0;
int x = 1;
return (x == 1); // The subscript expression ("x") is checked against
// the requested index of arr (which is "1" in this case
}
This enables reverse mode to differentiate w.r.t. to arrays:
double g1(double *arr) {
return arr[0] * arr[0] + arr[1];
}
// Differentiate f w.r.t each of the elements in arr
auto g1_darr = clad::gradient(g1);
// Produces:
// void g1_grad(double *arr, clad::array_ref<double> _d_arr) {
// double _t0;
// double _t1;
// _t1 = arr[0];
// _t0 = arr[0];
// double g1_return = _t1 * _t0 + arr[1];
// goto _label0;
// _label0:
// {
// double _r0 = 1 * _t0;
// _d_arr[0] += _r0;
// double _r1 = _t1 * 1;
// _d_arr[0] += _r1;
// _d_arr[1] += 1;
// }
// }
double arr[2] = {3, 4};
// The array needs to be of the same size as arr
double darr[2] = {0, 0};
clad::array_ref<double> darr_ref(darr, 2); // here 2 is the size of darr
// Execute the generated gradient
g1_darr.execute(arr, darr_ref);
// Results in: darr = {6, 1}
Because it was an unexpected objective, the PR to add array differentiation to reverse mode was largely unplanned. Along with that, there had been a lot of debate about it, but no one could agree on which technique would be the best to adopt. As a result, I had to first construct a prototype so that we could decide which path to follow, and I immediately realised that it implied some significant design challenges.
To understand the challenges, note the gradient's signature before the addition of array differentiation:
// Source Function
double g2(double x, double y) {
. . .
}
// Call clad gradient on g2
clad::gradient(g2);
// Gradient function generated by clad
void g2_grad(double x, double y, double* _result) { // the _result variable stores the derivative w.r.t. both x and y
. . .
}
The initial plan was to split _result
into one output variable for each input variable, such that the signature would
look like:
void g2_grad(double x, double* _d_x, double y, double *_d_y) { // the _d_x variable stores the derivative w.r.t. x and similarly _d_y is for y
. . .
}
Despite being the most user-friendly, this signature has a flaw. Users can specify the input variable from which they
wish to derive the function, and execute requires the signature of the derived function to function. Clad is divided
into two parts: the user interface, with which the user interacts to call clad to derive a function, and the plugin,
which reads the arguments given by the user to clad::gradient (or the other clad functions). Because execute
is part
of the user interface, it has to know the resultant function signature before clad knows which variables the user wishes
to derive against. As a result, clad requires that the resultant gradient function have the same signature regardless of
the variables supplied. We solved this problem by modifying the function signature to relocate all the output variables
to the end of the function and introducing an overload when the user just wants a subset of the input variables.
So if the user calls:
// Source Function
double g3(double x, double y, double z) {
. . .
}
clad::gradient(g3, "y");
clad would produce:
void g3_grad_1(double x, double y, double z, double *_d_y) { // The actual gradient that is produced
. . .
}
void g3_grad_1(double x, double y, double z, double *_d_y, double *_d_0, double *_d_1) { // The overload created to interface with "execute"
f_grad_1(x, y, _d_y);
}
When the user calls execute
, clad pads the user arguments with nullptr
, and invokes the gradient overload.
Another issue that needed to be addressed was the size of the array needed for differentiating call expressions. To address
this, we created a new class called clad::array_ref
to hold the diff, which accepts an array pointer and its size. This
approach also allows the user to submit variable-sized arrays, making it rather versatile. If the user just gives a pointer,
it thinks it is a single variable with a size of 1
.
So finally the signature of the produced gradient is:
// Source function
double g4(double *x, double y) {
. . .
}
// Call to clad gradient on g4
clad::gradient(g4);
// The produced gradient
double g4_grad(double *x, double y, clad::array_ref<double> _d_x, clad::array_ref<double> _d_y) {
. . .
}
This added support for generating hessians with functions that take array inputs:
double h1(double *x) {
return x[0] * x[0] + x[1] * x[1];
}
// Generate hessian w.r.t. to all elements in x
auto h_hess = clad::hessian(h1, "x[0:1]");
double x[2] = {2, 3};
// Hessian produces a matrix so the size required is the square
// of number of variables we are differentiating against
// Since we are passing a single array of size 2 the hessian matrix
// generated is of size 4
double hess[4] = {0};
clad::array_ref<double> hess_ref(hess, 4);
// Executes the generated hessian
h_hess.execute(x, hess_ref);
// Results in: hess = {2, 0, 0, 2}
The original plan was to try to find the size of the p
array by counting the indexes used in the source function but the
modified objective meant that the size arrays could no longer be calculated by hessian.
The difficulty we encountered was determining the size of the input arrays. The hessian function returns a flattened
matrix, and using a clad::array_ref
for the hessian matrix does not provide the different array sizes. To address
this, clad::hessian
was modified to take the array size via its args parameter. As an example, consider the following:
double h2(double *x, double *y) {
return x[0] * y[0] + x[1] * y[1] + x[2] * x[3]
}
// The size of x in h2 function is 3 and x[0:2]
// means differentiate indexes 0 through 2 for x. \
// Same for y
clad::hessian(h2, "x[0:2], y[0:2]");
Using the index representation we were able to get the size of the arrays that we need to differentiate.
This was a straightforward objective. Because the signature of the produced gradient has changed, the ROOT interface with clad had to be modified to match the changes
One minor issue we discovered was that executing code through the ROOT interpreter is slow, so we had to avoid it
wherever feasible. Initially, the interpreter was used to construct and delete the clad::array_ref
object needed to
provide to the produced gradient. Because this is called every time the user wants to employ the produced gradient, it
significantly slowed things down. To get around this, one solution was to expose the clad::array_ref
header to the
TFormula class, which incurred extra complexity. Instead, we used a struct with the same data members as clad::array
ref, and because the objects are provided as void pointers and explicitly cast to the type of the real argument in the
trampoline function (basically a reinterpret_cast
), we were able to reduce the additional cost by using this
technique.
This mostly duplicates the code that supports clad::gradient
to support clad::hessian
in ROOT.
I couldn't complete this requirement as a lot of time went into adding array differentiation support for reverse mode. I will put up a PR soon.
I was able to complete my mandatory goals and learnt a lot about automatic differentiation, compilers and template metaprogramming. I had a great time and would like to continue contributing more to the clad, ROOT and the CERN-HSF community in general.
For an in-depth demonstration of how to utilise array differentiation in clad in various modes check this link.