AndyAyersMS/IPA Sketch

## IPA Sketch
I had hoped to forward some documents based on past work to build interprocedural analysis (IPA) in systems like Bartok/Phoenix, but didn’t find anything that was both accurate and concise.

So here’s a sketch of how the system could (and I would argue, should) be built. UTC/NUTC does something fairly similar.

First, some goals
•	Support concurrency where possible, since IPA instances can be quite large
•	Support partitioning (dividing input set into two or more parts, analyzing each separately)
•	Allow pluggable components for particular problem domains
•	Keep procedural knowledge suitably compartmentalized, avoid duplication (eg don’t have multiple type systems or multiple IL analyzers)

And a non-goal
•	Support incremental recomputation of IPA results

The main idea is to split IPA into two sub-parts: within (intra) procedural, and across (inter) procedural.

Intra procedural generates summary facts for methods based on the IL (aka jump functions) and ultimately consumes them, either directly (by reading summary data) or indirectly (via queries to the global solution).

Inter procedural stitches together the problem instance (call graph), orchestrates the concurrency, has a solver to reaching suitable fixpoints, and may keep global models for some problem domains.

Summary facts are represented via individual finite lattices (can be infinite with suitable care). So there is some symbolic representation of facts with the ability to rank sets of facts and merge or intersect them. There are often bottom and top elements representing things like no information or conflicting information.

Usually the fixed point summaries are “sound” meaning that anything that can possibly be true is supported by the summary, but some things that cannot be true are supported by the summary. There is usually an accuracy-time-space tradeoff; more accurate results may take longer or require more detailed models.

Occasionally it is interesting to allow for unsound results, in particular if the system is trying to predict what is likely to be true rather than what must be true ( ~ probabilistic data flow ). In such cases we can rely the facts as hints (say to drive inlining in likely profitable directions) or else take a dependence on them via guarded or speculative techniques to ensure we have the ability to recover if those facts end up not being true when the code is executed. For example, if we have to assume that a method may be called via reflection, we may not have all incoming call graph edges for the node corresponding to the method. Therefore, we can’t compute the exact set of values that the method’s parameters may take but we may be able to compute the set that is likely. Optimizations can still take advantage of the additional information about the parameters but have to make sure the code that is ultimately executed is correct for all cases. Another example is the possibility of method bodies that can change after the analysis (e.g., via profiler re-jit). In this case, information computed about a method before its body changed may not be correct. If that information was used for optimizing any other methods (e.g., the method’s callers), the code for those methods has to be re-generated. We could use a technique similar to the current inline tracking to detect when one method’s codegen depends upon another so that the right invalidations can happen.

The Call Graph

The IPA solver first starts by constructing the call graph. This usually requires IL analysis to try and determine which methods are called. The call graph must represent external or unknown callers (since not all call sites may be knowable) and external and/or unknown callees (since the full set of methods that can be called may not be knowable). There are varying degrees of sophistication in call graph construction. Determining the set of methods that can be called by a method is itself an IPA problem. The problem may also have dynamic behavior (say, determining the set of required generic instantiations) where as the solution process evolves the candidate set of methods can change. The call graph need not be 100% accurate (indeed, it usually cannot be) but it should be a sound approximation. At some point the IPA solver decides it has a suitably sound approximation of the call graph and we move onto the next stage.

[The “Base case” here is to just assume any method can be called from unknown places and calls unknown other methods, this causes IPA to degenerate to regular interprocedural analysis. This can be useful for handling things like debug compiles, where we would prefer not to have information propagate across methods, and all method codegen can in principle happen concurrently (as the degenerate call graph dag is just a bunch of isolated nodes)].

For partitioned problems the externally calling and callable methods may be represented as placeholders, and their IPA summaries made available to help refine the results in the current partition.

Creation of a DAG

From the call graph, the solver then creates a partial order (dag) of methods, typically by running cycle detection (dfs). This dag defines the concurrency constraints for the local analysis. Commonly we are interested in either top-down (callers->callees) or bottom-up (callees->callers) traversals of this dag.

Passes

The solver then runs some number of alternating traversal (concurrent, over the dag) and solver passes.

In a traversal pass, the local analysis is run; it can access facts produced in prior passes as well as facts available from “predecessor” nodes in the current ordering. It produces new summary facts and/or (ultimately) code based on those facts. This pass runs concurrently with “schedule” determined by the dag.

In a solver pass, the IPA system uses the accumulated summary facts to reach fixed points. This uses the full call graph and is usually solved iteratively: a node is chosen, an a new summary state is generated from the current state and the most recent ‘input’ states (callers for a top-down, callees for bottom-up) and the new result is published. The entire process iterates until each node reaches a fixed point where no node changes state. There are various routes to accelerate convergence here, eg if the call graph is truly a dag and has no cycles then any uni-directional analysis will converge in one pass. This solving typically runs non-concurrently, though concurrent iteration is possible.

Generally the IPA solver part is maybe 10% of overall work, the local analysis 90%. Amdahl’s law scaling says speedups of 9x are possible. 4-6x is more realistic and is what we saw in practice.

The summary facts may be bundles of independent lattice results or the lattice solutions can interact. For example if we are doing both interprocedural constant propagation and interprocedural alias analysis, the parts of code reachable in a method will depend on constant propagation, and this in turn may alter the sets of possible aliases. It is well known that “mutual fixed points” can be more accurate than trying to fix point each problem separately.

Particular problems can often be classified as top-down or bottom-up. Some problems are bidirectional.

Generally summaries represent facts true at all invocations of a method. One can try and do path-specific analysis where summary data applies only to some callers or some specific pattern of callers (sometimes called contextual) but when doing this generally the memory requirements and logistics get complicated. Sometimes you can special case this (eg a helper that just turns around and invokes a method passed to it as an argument). Otherwise high fan-in fan-out nodes can dramatically increase the amount of approximation in a solution, as facts flow along so called “infeasible paths.”

One can also do contextual analysis via inlining. When generating code concurrently and actually inlining, care must be taken not to access IPA facts out of dependence order, so some special access check logic is needed for summary accesses.

Analyses

Typical top-down problems are:
•	Constant propagation (callee would like to know if all callers pass a constant for some argument)
•	Alias analysis (callee would like to know if some parameters can alias one another)
•	Type analysis (callee would like to know possible types for parameters)

Typical bottom-up problems are:
•	Killed register set analysis (caller would like to know if callee actually preserves some caller-save registers)
•	Escape analysis (caller would like to know if callee allows some argument to escape)
•	Useless parameter analysis (caller would like to know if callee (transitively)) ignores some argument
•	Inlining impact analysis (caller would like to know if certain values of arguments might help optimize callee, rough size of “call tree” under callee, etc)
•	Read/written parameter analysis for refs and byrefs (caller would like to know if callee only reads from argument, or always writes to argument, etc)
•	Null check analysis (caller would like to know if callee null checks some argument)
•	NoThrow/NoReturn/MayNotReturn (caller would like to know if callee cannot throw an exception, must throw an exception, etc)
•	Comdat folding (duplicate method detection)

And some true IPA problems
•	What types can be constructed?
•	What types can be assigned to a field (either static or instance)? What values can be stored in a field (ditto)?
•	What constructed types implement a given interface?
•	Call graph construction
•	Method layout in an image (callers near callees)

Historically in native compiles just doing IPA hasn’t been that big a of a perf win (think single digits). Many times code expansion is needed to take advantage of IPA facts and this can’t be done blindly, as it has costs.

But in conjunction with profile feedback to help focus code expansion into profitable areas, IPA it really shines.

Things may be different for managed code; we may see more substantial benefits “more cheaply” because of the relatively higher costs of some of our abstractions. I don’t recall offhand what we saw in Midori from this.
	I had hoped to forward some documents based on past work to build interprocedural analysis (IPA) in systems like Bartok/Phoenix, but didn’t find anything that was both accurate and concise.

	So here’s a sketch of how the system could (and I would argue, should) be built. UTC/NUTC does something fairly similar.

	First, some goals
	• Support concurrency where possible, since IPA instances can be quite large
	• Support partitioning (dividing input set into two or more parts, analyzing each separately)
	• Allow pluggable components for particular problem domains
	• Keep procedural knowledge suitably compartmentalized, avoid duplication (eg don’t have multiple type systems or multiple IL analyzers)

	And a non-goal
	• Support incremental recomputation of IPA results

	The main idea is to split IPA into two sub-parts: within (intra) procedural, and across (inter) procedural.

	Intra procedural generates summary facts for methods based on the IL (aka jump functions) and ultimately consumes them, either directly (by reading summary data) or indirectly (via queries to the global solution).

	Inter procedural stitches together the problem instance (call graph), orchestrates the concurrency, has a solver to reaching suitable fixpoints, and may keep global models for some problem domains.

	Summary facts are represented via individual finite lattices (can be infinite with suitable care). So there is some symbolic representation of facts with the ability to rank sets of facts and merge or intersect them. There are often bottom and top elements representing things like no information or conflicting information.

	Usually the fixed point summaries are “sound” meaning that anything that can possibly be true is supported by the summary, but some things that cannot be true are supported by the summary. There is usually an accuracy-time-space tradeoff; more accurate results may take longer or require more detailed models.

	Occasionally it is interesting to allow for unsound results, in particular if the system is trying to predict what is likely to be true rather than what must be true ( ~ probabilistic data flow ). In such cases we can rely the facts as hints (say to drive inlining in likely profitable directions) or else take a dependence on them via guarded or speculative techniques to ensure we have the ability to recover if those facts end up not being true when the code is executed. For example, if we have to assume that a method may be called via reflection, we may not have all incoming call graph edges for the node corresponding to the method. Therefore, we can’t compute the exact set of values that the method’s parameters may take but we may be able to compute the set that is likely. Optimizations can still take advantage of the additional information about the parameters but have to make sure the code that is ultimately executed is correct for all cases. Another example is the possibility of method bodies that can change after the analysis (e.g., via profiler re-jit). In this case, information computed about a method before its body changed may not be correct. If that information was used for optimizing any other methods (e.g., the method’s callers), the code for those methods has to be re-generated. We could use a technique similar to the current inline tracking to detect when one method’s codegen depends upon another so that the right invalidations can happen.

	The Call Graph

	The IPA solver first starts by constructing the call graph. This usually requires IL analysis to try and determine which methods are called. The call graph must represent external or unknown callers (since not all call sites may be knowable) and external and/or unknown callees (since the full set of methods that can be called may not be knowable). There are varying degrees of sophistication in call graph construction. Determining the set of methods that can be called by a method is itself an IPA problem. The problem may also have dynamic behavior (say, determining the set of required generic instantiations) where as the solution process evolves the candidate set of methods can change. The call graph need not be 100% accurate (indeed, it usually cannot be) but it should be a sound approximation. At some point the IPA solver decides it has a suitably sound approximation of the call graph and we move onto the next stage.

	[The “Base case” here is to just assume any method can be called from unknown places and calls unknown other methods, this causes IPA to degenerate to regular interprocedural analysis. This can be useful for handling things like debug compiles, where we would prefer not to have information propagate across methods, and all method codegen can in principle happen concurrently (as the degenerate call graph dag is just a bunch of isolated nodes)].

	For partitioned problems the externally calling and callable methods may be represented as placeholders, and their IPA summaries made available to help refine the results in the current partition.

	Creation of a DAG

	From the call graph, the solver then creates a partial order (dag) of methods, typically by running cycle detection (dfs). This dag defines the concurrency constraints for the local analysis. Commonly we are interested in either top-down (callers->callees) or bottom-up (callees->callers) traversals of this dag.

	Passes

	The solver then runs some number of alternating traversal (concurrent, over the dag) and solver passes.

	In a traversal pass, the local analysis is run; it can access facts produced in prior passes as well as facts available from “predecessor” nodes in the current ordering. It produces new summary facts and/or (ultimately) code based on those facts. This pass runs concurrently with “schedule” determined by the dag.

	In a solver pass, the IPA system uses the accumulated summary facts to reach fixed points. This uses the full call graph and is usually solved iteratively: a node is chosen, an a new summary state is generated from the current state and the most recent ‘input’ states (callers for a top-down, callees for bottom-up) and the new result is published. The entire process iterates until each node reaches a fixed point where no node changes state. There are various routes to accelerate convergence here, eg if the call graph is truly a dag and has no cycles then any uni-directional analysis will converge in one pass. This solving typically runs non-concurrently, though concurrent iteration is possible.

	Generally the IPA solver part is maybe 10% of overall work, the local analysis 90%. Amdahl’s law scaling says speedups of 9x are possible. 4-6x is more realistic and is what we saw in practice.

	The summary facts may be bundles of independent lattice results or the lattice solutions can interact. For example if we are doing both interprocedural constant propagation and interprocedural alias analysis, the parts of code reachable in a method will depend on constant propagation, and this in turn may alter the sets of possible aliases. It is well known that “mutual fixed points” can be more accurate than trying to fix point each problem separately.

	Particular problems can often be classified as top-down or bottom-up. Some problems are bidirectional.

	Generally summaries represent facts true at all invocations of a method. One can try and do path-specific analysis where summary data applies only to some callers or some specific pattern of callers (sometimes called contextual) but when doing this generally the memory requirements and logistics get complicated. Sometimes you can special case this (eg a helper that just turns around and invokes a method passed to it as an argument). Otherwise high fan-in fan-out nodes can dramatically increase the amount of approximation in a solution, as facts flow along so called “infeasible paths.”

	One can also do contextual analysis via inlining. When generating code concurrently and actually inlining, care must be taken not to access IPA facts out of dependence order, so some special access check logic is needed for summary accesses.

	Analyses

	Typical top-down problems are:
	• Constant propagation (callee would like to know if all callers pass a constant for some argument)
	• Alias analysis (callee would like to know if some parameters can alias one another)
	• Type analysis (callee would like to know possible types for parameters)

	Typical bottom-up problems are:
	• Killed register set analysis (caller would like to know if callee actually preserves some caller-save registers)
	• Escape analysis (caller would like to know if callee allows some argument to escape)
	• Useless parameter analysis (caller would like to know if callee (transitively)) ignores some argument
	• Inlining impact analysis (caller would like to know if certain values of arguments might help optimize callee, rough size of “call tree” under callee, etc)
	• Read/written parameter analysis for refs and byrefs (caller would like to know if callee only reads from argument, or always writes to argument, etc)
	• Null check analysis (caller would like to know if callee null checks some argument)
	• NoThrow/NoReturn/MayNotReturn (caller would like to know if callee cannot throw an exception, must throw an exception, etc)
	• Comdat folding (duplicate method detection)

	And some true IPA problems
	• What types can be constructed?
	• What types can be assigned to a field (either static or instance)? What values can be stored in a field (ditto)?
	• What constructed types implement a given interface?
	• Call graph construction
	• Method layout in an image (callers near callees)

	Historically in native compiles just doing IPA hasn’t been that big a of a perf win (think single digits). Many times code expansion is needed to take advantage of IPA facts and this can’t be done blindly, as it has costs.

	But in conjunction with profile feedback to help focus code expansion into profitable areas, IPA it really shines.

	Things may be different for managed code; we may see more substantial benefits “more cheaply” because of the relatively higher costs of some of our abstractions. I don’t recall offhand what we saw in Midori from this.