This report summarizes the work done in my GSoC 2021 project, Supporting Arrays and Allocatables in LFortran wth LFortran. A step by step development of the project is available at czgdp1807.github.io.
I am a final year Bachelor of Technology student at Indian Institute of Technology, Jodhpur in the department of Computer Science and Engineering.
This project aimed to add support for arrays and allocatables in LFortran. There were some miscellaneous goals as well, such as improving support for pointers and kinds and some bug fixes discovered along the way. The original plan of the project is listed as follows,
- Community Bonding - During the community bonding period, work on declaring arrays was supposed to be complete, and array initialisers in the phase of implementation. Moreover, operations on arrays were planned to be under discussions.
- Phase 1 - In this phase, features like operations on arrays were supposed to be added, along with indexing and slicing arrays, discussing ways to pass arrays as function/subroutine arguments.
- Phase 2 - In the final phase, I had planned to complete work on passing arrays as function/subroutine arguments, implementing intrinsic functions for arrays and adding support for allocatable arrays.
This section describes the actual work done during the coding period in terms of merged PRs.
- Community Bonding
- Phase 1
- Phase 2
Future work for arrays include optimizations for array operations such as array tiling, trying out various descriptors. Following are the links to relevant issues,
Thoughts and Learnings
A major point which I learnt during my project is that writing compilers requires designing algorithms along with paying attention to ease of use for the end user. To make that possible one needs to write code for several corner cases which we usually tend to ignore. For example, in LLVM, it is not possible to add an integer with a floating point number. A cast should be performed before performing the addition. In the following three points, I have shared some similar experiences,
Array Operations - We implemented this feature by converting statements like
c = a + bto
doloops with a body like,
c(i) = a(i) + b(i). Here,
bare arrays. Now, this requires two things. First figuring out, where the result should go and then what's the starting and ending point of each dimension. For the second, Fortran provides
uboundfor arrays. We implemented them first and then this feature. For deciding the result variable, one may need to keep track of it during the conversion of array operations to loops. For example, in
c = a + b + d. First,
a + bwill be converted to do loop and its result will be stored in a temporary variable, say
c = x + dwill be converted to do loop and the result will be directly sent to
c. This has to happen for any operation and any statement where such operations are present which makes it quite challenging. I implemented a recursive algorithm while keeping
result_varas a global variable which tracks whether the result should go to a temporary variable or some variable already provided by the user. For example, once we go from current node, say,
Add(Add(a, b), d)to
Add(a, b), we need to update the
x(a newly created one). Once come back from
Add(Add(a, b), d), the reverse happens.
Returning Arrays from Functions - The reason why I am mentioning about this feature is because of its elegant implementation. Literally returning arrays from functions may require creating a new array inside the function and then copying it to the destination once that array is returned. Say, for example,
d = f(a, b, c).
fwill first return an array say,
ret, then this will be copied to
d, resulting in doubling the peak memory consumption. However, if we convert
fto a subroutine, then the previous call will become,
call f(a, b, c, d). Now,
fwill directly write to
d. So, no extra memory will be required. Thanks to Ondrej for suggesting this approach.
Segmentation Faults and Double Free - Ah! This was the most annoying part of my project. Espescially when I was implementing automatic deallocation of arrays on heap memory, I had to deal with a bunch of double free errors. Things got messier when an array got automatically freed before a subroutine call but inside the subroutine it wasn't again assigned some memory. This resulted in several segmentation faults one after the other and fixing them was no easy.
Overall, I think this project has made me a better programmer, problem solver and yeah a debugger (