Skip to content

Instantly share code, notes, and snippets.

@w1th0utnam3
Last active May 18, 2018 16:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save w1th0utnam3/f5823be33555c05dc1ed9e34fb977c56 to your computer and use it in GitHub Desktop.
Save w1th0utnam3/f5823be33555c05dc1ed9e34fb977c56 to your computer and use it in GitHub Desktop.
FEniCS GSOC18 application

Maximizing performance on modern architectures with data-level parallelism

Abstract

In the solution process of problems discretized using FEM, the assembly of element level tensors to the global matrix often contributes a significant amount of computational time relative to the overall process. The goal of this GSoC 2018 project is to improve the performance of the FEniCS framework in the assembly phase. To achieve this, the FEniCS Form Compiler (FFC) should be enhanced such that the generated code can fully utilize data-level parallelism (SIMD) functionality of modern CPUs.

Technical Details

In the FEniCS framework, DOLFIN provides an interface for the user that connects and hides several internal components. Users formulate their problems consisting of variational form, function spaces, boundary conditions, etc. in the high level Unified Form Language (UFL). When the user requests a solution, DOLFIN performs all necessary steps leading to matrix assembly and calling a linear solver from an external library. During the assembly stage, DOLFIN loops over all mesh entities and computes the local tensors using calls to corresponding tabulate_dofs/tabulate_tensor functions (e.g. evaluating the cell integrals). The C++ code of these functions is generated by FFC beforehand, during the analysis of the UFL code and subsequently compiled using dijitso. FFC is implemented in Python and generates the C++ code in several stages. It starts by translating the UFL form to an intermediate representation (IR) that is optimized in subsequent stages before being translated to C++. For the tabulate_tensor function, the IR is given in form of an abstract syntax tree (AST) implemented using Python classes that model available C++ expressions/statements.

As previously stated, the goal of this project is identify and realize optimization potential in FFC in order to make use of SIMD instructions in the generated C++ code. This is important for current architectures in order to achieve practically optimal throughput of arithmetic operations. In particular, this may be done either by adding explicit SIMD intrinsics that are specific to the involved third party C++ compiler or more generally by generating code that is friendly to automatic vectorization. As FEniCS is currently not restricted to a specific compiler, the latter approach is probably more sensible. However, it should also be considered whether there may be more effective approaches than only improving the local cell/facet/etc. integral code. To help auto-vectorization by the C++ compiler, common techniques are

  • architecture dependent padding and data alignment
  • loop interchange or permutation
  • loop unrolling
  • loop fusion

Nevertheless, the benefit of adding intrinsics should also be investigated.

Overall, the main tasks of the problem can be summarized as:

  • Identifying specific application problems (i.e. variational forms and element types) that are bottelnecked by the content of the tabulate_tensor functions to use them as an optimization target
  • Examining current solutions in similar projects, e.g. TSFC and COFFEE from the Firedrake project, see "Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly"
  • Implementing and testing different approaches while monitoring possible performance improvements on the previously selected problems as well as regressions for simpler problems
  • Possibly investigate other reasons for bottlenecks in the matrix assembly code

During the project it is sensible to keep in contact with contributors of the Firedrake project as similar optimization were already performed in their codebase.

Schedule of Deliverables

Community Bonding Period: April 23 - May 13 (3 weeks)

During the community bonding period I would like to discuss my plans with more members of the team, make them more detailed and get to know the current state of the code better.

I'm on holiday from April 17 to May 9, doing a round trip through Japan, where I visit friends from university. Unfortunately the dates overlap with the bonding period as I booked the flights last year to not collide with my exams that I had to take earlier this month. My plan is to catch up on the missing time after coming back and to already get to know the project and team better before the official results of the applications are announced (April 23).

  • Week -5 (April 9 - 15):
    • Develop more detailed plans for the project in discussion with mentor (Jack Hale)
    • Exchange with other team members and experts from Firedrake, etc.
  • Week -4 to -2 (April 16 - May 6): On holiday
  • Week -1 (May 7 - 13):
    • Setup of blog for weekly reports
    • Setup of development environment
    • Get a better understanding of which parts of code generation are related to what mathematical concepts (maybe write a blog post that creates a "map" of the relevant code pieces)

Phase 1: May 14 - June 10

The focus of the first phase is to make sure that there is a metric to measure performance improvements/regressions and to examine how to check whether the compiler generates vectorized code. Furthermore, first ideas to improve code generation should be implemented.

  • Week 1 (May 14 - 20):
    • Identify problems where the assembly stage is bottlenecked by FFC generated code
    • Check whether there may already be some optimization potential using the current form of code generation (e.g. with permutations of statements) or whether larger changes are definitely necessary (e.g. changing the way coefficients are stored)
  • Week 2 & 3 (May 21 - 27, May 28 - June 3):
    • Setup a benchmark or adapt the currently available regression test benchmarks for this project
    • Depending on results of previous weeks try out and evaluate results of small scale optimizations
  • Week 4 (June 4 - June 10):
    • Prepare phase 1 commit

Phase 2: June 15 - July 13

During the second phase, the performance of the generated code should be improved iteratively by implementing and combining the common optimization techniques or by trying out new ideas. This phase mainly consists of alternating between the implementation and the evaluation using the previously defined benchmarks.

  • Week 5 to 7 (June 11 - 17, June 18 - 24, June 25 - July 1):
    • Time for larger refactoring/restructuring ideas
  • Week 8 (July 2 - 8):
    • Prepare phase 2 commit

Phase 3 and Final Week: July 13 - August 14

The goal of the last phase is to finalize the implementation of the new features. The focus should be on fixing bugs, writing unit or regression tests and to clean up code and documentation. The tests and documentation should also be committed to the project.

  • Week 9 & 10 (July 9 - 15, July 23 - 29):
    • Evaluate the current performance: "last chance" to look in other directions for optimization
    • Discuss current state with Firedrake developers as well
    • If progress is satisfying: maybe also try out optimizations unrelated to SIMD
  • Week 11 (July 23 - 29):
    • Implement integration and regression tests for the FFC enhancements
  • Week 12 (July 31 - August 5):
    • No new features in last weeks, instead: code clean up and potentially add missing comments and documentation
    • Final test runs of implementation
  • Week 13 (August 6 - August 14):
    • Spare time for bug fixing
    • Prepare final merge request of features and tests

Development Experience

In order to get to know the project better, I identified FFC issue #173 to be closely related to the area of the project. Therefore, I tried to track the issue down and - in contact with other project developers through Slack - implemented a possible fix which is currently available in my branch. We are currently discussing whether this patch is the correct direction but I already created a corresponding pull request (#95). Related to this issue, I reported the related problem in GCC as GCC issue #85072.

Before applying for this GSoC project, I gained development experience through the following projects:

  • University: Group for Computer Animation, 2017-Now

    Student assistant job - Currently, I'm helping with research on time integration methods for deformable bodies by implementing and evaluating numerical methods in C++ using Eigen. Furthermore, I worked on improving the PositionBasedDynamics library which is maintained by the group.

  • Bachelor thesis and internship, Winter 2016 - Spring 2017 (6 months)

    "Extension of an application for thermal analysis of satellites" - During my Bachelor thesis and internship (6 months) at a large European spacecraft manufacturer, I worked fulltime on the implementation of numerical methods for ODEs and iterative solvers and evaluation of their performance. The implementation was mainly done in Python using NumPy but I also worked a little bit with MATLAB. Furthermore, I built interfaces between Python and C/C++ libraries using ctypes.

  • University: Institute for Fluid Power Drives and Controls, 2014-2016

    Student assistant job - I helped with the development of a C++ application for the simulation of fluid flow in "axial piston pumps". I worked on the numerical code which uses the KINSOL nonlinear solver, the user interface (Qt) and result visualization (embedded VTK views). A paper based on this tool was published by the PhD candidate I worked with: Stephan Wegner.

  • Found and reported the following C++ compiler bugs:

    • GCC #82613: "Cannot access private definitions in base clause of friend class template"
    • MSVC: "C1001 when performing uniform initialization with pack expansion over alias template"
    • MSVC: Bug related to "Error modifying struct member in constexpr function"
  • Some minor contributions to github repositories MetaStuff: #1, #2 and EnTT: #1

  • In my spare time I worked on some personal GitHub repositories but mainly to collect utilities I used for my work (e.g. noname_tools) or simply to try out things learned from courses (e.g. phyani_playground). In these repositories I like to play around with modern C++ techniques as I'm a regular listener of CppCast and like to follow the most recent developments of the language (towards C++20, etc.).

Other Experience

I'm currently studying Computational Engineering Science (M. Sc.) at RWTH Aachen University, Germany. The programme connects engineering, computer science and mathematics. We were introduced to FEM from the mathematical point of view (@MathCCES RWTH). The lectures were always accompanied by exercises where we implemented the numerical methods e.g. in MATLAB. Furthermore, I heard a lecture on FEM for fluid mechanics which was more engineering/implementation oriented (@CATS RWTH). In computer science lectures we were introduced to the basic data structures and algorithms, software engineering principles and later focused on high performance computing including implementation techniques for MPI, OpenMP and optimizing for data-level parallelism (i.e. SIMD).

During a student project at MathCCES I worked with another student on a simplified wake model for wind turbines which is usable for large scale optimization. After the implementation phase we validated it using OpenFOAM.

During my years at university I grew confident working on Linux and writing all my documents with LATEX. In several smaller coding projects we learned to managing our code with multiple developers in git.

Why this project?

Considering my development experience and the orientation of my study program, I spent a lot of the time at university with tool development for scientific computing and engineering. In my opinion, proper tooling for these tasks gets progressively more important as the involved problems become more complicated and computational power increases. Researchers should be able to focus on their projects and their tools are supposed to "just work" and be easily adaptable if necessary. When I was introduced to FEniCS during a math exercise I was quite excited by the simplicity of the high level language that can be used to describe problems. Furthermore I really liked the open source aspect of the package and the ability to extend and customize it. As open source software is so important in every day use and scientific computing, I would like contribute to a project. However, it always felt like the hurdle to get to know a project was too high. Therefore, I decided to apply for a project with the FEniCS team as it is relatively closely related to my study program and hopefully easier to get productive.

This concrete project with respect to optimization of the code generation components appealed to me the most as I'm very interested in how the "magic happens" in the background. Especially in cases like FEniCS, where the interface is relatively simple but still powerful. Btw., this also applies to C++ which is a reason as to why I follow the development of the language and the corresponding rationale with great interested. Unfortunately, I did not yet have the possibility to study compiler techniques at university. Therefore, I'm very keen to learn more about code generation through this project. I hope that this project can help me decide where I want to continue after my master thesis, e.g. whether I want to work as a PhD candidate in this field. From a more practical point of view, I know from my own experience how resource intensive the assembly stage in FEM is and I would like to improve the usability of the package in this regard.

Workload

In my opinion, the GSOC projects should be considered a fulltime job and accordingly I want to spend around 40 hours per week on the task. Currently, I'm still employed at the Computer Animation research group (mentioned earlier in Development Experience) and work there for 9 hours per week. My current plan is to handle these hours either in a single day or two half days per week depending on the respective tasks.

Appendix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment