/P.SE - Q303210

## P.SE - Q303210
2015-11-21 - P.SE - Q303210 - Low cost exception handling

Disclaimer:
001. I do not have any formal training or work experience in compilers or programming language theory.
002. My answer may contain technical terms - those I saw and half-understood from surfing the internet(s). Since I do not have the formal experience, my use of those technical terms might be incorrect, inaccurate, or imprecise.
003. My work consists of optimizing existing C++ hotspot code, checking the generated machine instructions and running benchmarks to verify that the C++ code has been written “to meet the performance goal”.
004. My apologies for including lots of irrelevant random musings in this answer.

This is a very broad question, i.e. one that requires e.g. a book chapter or more to explain, and P.SE is not known for welcoming this type of long questions.

The overall idea sounds legit; but to discuss the omissions would have made this question “too broad”. Many omissions or pitfalls can’t be predicted in advance unless one is already knee-deep implementing the feature.

If you are passionate about performance, you will have to experiment with many different ways of implementing your design, and you will have to benchmark with lots of code - not just the library code you’ve written as the principal language designer, but code that would be written by your language’s early adopters as well. This makes it a lifelong journey, and will probably last longer than your e.g. postgraduate degree program or current employment.

Today’s programming languages have to consider two main targets: Intel Architecture vs ARM Architecture. (Sorry, RISC. You’re a great idea but you’re not a public company.) They are… very different, performance-wise. Their machine code looked very different. If you are passionate about performance, I guess you would have bought a few of their development boards.

Let’s not forget the main benefit from inlining is ***not*** just from the elimination of the “jump / indirect jump” instructions, or the purported “pipeline flush / cache miss”.

The main benefit from inlining, from an optimizing compiler’s perspective, is that once the caller and callee became one, the optimizer can find many new amazing opportunities - new invariants, new propagations, new common subexpression elimination, new loop unrolling, new code motion, etc. In hindsight, the gain from just eliminating the jump instruction is a red herring. Besides, if you successfully eliminated the jump, but your compiler is unable to effectively exploit those new transformational opportunities, the gain from inlining will be modest at best.

Here I discuss a few quizzes which I use to frame my own understanding and experience from my tinkering of C++ code and the resulting machine instructions. Disclaimer: please do not take my answers. Try do your own experiments and form your own conclusions.

Quiz 001. Which language feature has bigger impact on performance? 001. array bounds check. 002. exception handling. Answer: array bounds check, since it is performed on every operation, whereas exception handling happens only when a logical bug happens, and is only executed once for each occurrence.

Quiz 002. Which operation has bigger impact on performance? 001. array bounds check. 002. computing the memory address for a two-dimensional array (such that it’s not a simple bit shift). Answer: computing the memory address. Firstly, it involves an integer multiplication. Secondly, the CPU’s speculative logic cannot guess at the resulting memory address until the multiplication had completed. This is unlike a load from register offset. You’re at the mercy of cache prefetch. Array bounds check is a comparison and conditional jump that is rarely taken, assuming the logical bug doesn’t happen.

Quiz 003. Are destructor calls included in the cost analysis of exception handling? Here is my counter-question: does your language skip through destructors whenever there is no exception? If destructors are invoked whether or not exceptions are thrown, then how can one include destructors into the cost analysis of exception handling?

In order to prevent code bloat, some compilers opt to give up “machine code hygiene”, and use the same destructor code fragment for both normal-flow and exceptional-flow purposes. To do this, it may allow a mismatch between number of calls and returns, but only when there is an exception. The normal flow is just normal and linear. Sometimes a compiler may allow a “function inside function” layout, where the exception handler makes a “call” or a “jump” into the middle of a function, precisely to the beginning of the destructor code fragment.
	2015-11-21 - P.SE - Q303210 - Low cost exception handling

	Disclaimer:
	001. I do not have any formal training or work experience in compilers or programming language theory.
	002. My answer may contain technical terms - those I saw and half-understood from surfing the internet(s). Since I do not have the formal experience, my use of those technical terms might be incorrect, inaccurate, or imprecise.
	003. My work consists of optimizing existing C++ hotspot code, checking the generated machine instructions and running benchmarks to verify that the C++ code has been written “to meet the performance goal”.
	004. My apologies for including lots of irrelevant random musings in this answer.

	This is a very broad question, i.e. one that requires e.g. a book chapter or more to explain, and P.SE is not known for welcoming this type of long questions.

	The overall idea sounds legit; but to discuss the omissions would have made this question “too broad”. Many omissions or pitfalls can’t be predicted in advance unless one is already knee-deep implementing the feature.

	If you are passionate about performance, you will have to experiment with many different ways of implementing your design, and you will have to benchmark with lots of code - not just the library code you’ve written as the principal language designer, but code that would be written by your language’s early adopters as well. This makes it a lifelong journey, and will probably last longer than your e.g. postgraduate degree program or current employment.

	Today’s programming languages have to consider two main targets: Intel Architecture vs ARM Architecture. (Sorry, RISC. You’re a great idea but you’re not a public company.) They are… very different, performance-wise. Their machine code looked very different. If you are passionate about performance, I guess you would have bought a few of their development boards.

	Let’s not forget the main benefit from inlining is *not* just from the elimination of the “jump / indirect jump” instructions, or the purported “pipeline flush / cache miss”.

	The main benefit from inlining, from an optimizing compiler’s perspective, is that once the caller and callee became one, the optimizer can find many new amazing opportunities - new invariants, new propagations, new common subexpression elimination, new loop unrolling, new code motion, etc. In hindsight, the gain from just eliminating the jump instruction is a red herring. Besides, if you successfully eliminated the jump, but your compiler is unable to effectively exploit those new transformational opportunities, the gain from inlining will be modest at best.

	Here I discuss a few quizzes which I use to frame my own understanding and experience from my tinkering of C++ code and the resulting machine instructions. Disclaimer: please do not take my answers. Try do your own experiments and form your own conclusions.

	Quiz 001. Which language feature has bigger impact on performance? 001. array bounds check. 002. exception handling. Answer: array bounds check, since it is performed on every operation, whereas exception handling happens only when a logical bug happens, and is only executed once for each occurrence.

	Quiz 002. Which operation has bigger impact on performance? 001. array bounds check. 002. computing the memory address for a two-dimensional array (such that it’s not a simple bit shift). Answer: computing the memory address. Firstly, it involves an integer multiplication. Secondly, the CPU’s speculative logic cannot guess at the resulting memory address until the multiplication had completed. This is unlike a load from register offset. You’re at the mercy of cache prefetch. Array bounds check is a comparison and conditional jump that is rarely taken, assuming the logical bug doesn’t happen.

	Quiz 003. Are destructor calls included in the cost analysis of exception handling? Here is my counter-question: does your language skip through destructors whenever there is no exception? If destructors are invoked whether or not exceptions are thrown, then how can one include destructors into the cost analysis of exception handling?

	In order to prevent code bloat, some compilers opt to give up “machine code hygiene”, and use the same destructor code fragment for both normal-flow and exceptional-flow purposes. To do this, it may allow a mismatch between number of calls and returns, but only when there is an exception. The normal flow is just normal and linear. Sometimes a compiler may allow a “function inside function” layout, where the exception handler makes a “call” or a “jump” into the middle of a function, precisely to the beginning of the destructor code fragment.