Skip to content

Instantly share code, notes, and snippets.

@mdempsky
Created July 12, 2023 19:27
Show Gist options
  • Save mdempsky/63c93ba1b4a1eb0b3fc301d036386a2d to your computer and use it in GitHub Desktop.
Save mdempsky/63c93ba1b4a1eb0b3fc301d036386a2d to your computer and use it in GitHub Desktop.
Inlining thoughts
* Go notes
To fit into Go's current package-oriented compilation mode, we can
define a set of top-down root entry-points into each package.
Any global variable initializations and init functions are always part
of the root.
For package main, the main function is also a root.
For non-main packages, the package's exported objects are its roots.
Similar to the old "crawler" algorithm used by iexport.
Beyond escape analysis, it's probably worth knowing when pointers flow
to an unsafe.Pointer conversion, because that indicates the pointed-to
object can be modified arbitrarily. (Relatedly: go.dev/issue/58625.)
I remember also looking into making string->[]byte or []byte->string
conversions more efficient. go.dev/issue/2205, go.dev/issue/43752.
* Inlining and flow analysis
Callahan et al, 1986.
"Interprocedural constant propagation"
https://dl.acm.org/doi/10.1145/12276.13327
Introduced the idea to use "jump" functions to summarize data flow
within functions, so that they can be wired up into interprocedural
analyses. In Go's separate compilation model, we'd probably write this
out via export data.
Jagannathan and Wright, 1996.
"Flow-directed Inlining"
https://dl.acm.org/doi/10.1145/231379.231417
Waddell and Dybvig, 1997.
"Fast and Effective Procedure Inlining"
https://link.springer.com/chapter/10.1007/BFb0032732
Ashley, 1997.
"The effectiveness of flow analysis for inlining"
https://dl.acm.org/doi/10.1145/258949.258959
A bunch of papers on the idea of using global control flow analysis to
improve inlining. I haven't read and understood them well enough just
yet to really understand their comparisons.
Gilray et al, 2016.
"Pushdown control-flow analysis for free"
https://dl.acm.org/doi/10.1145/2837614.2837631
Recent control flow analysis algorithm, which claims significant
improvements in precision and performance. Seems interesting to read
into further.
Though the "efficient" O(N^3) figure mentioned in the abstract is a
bit concerning.
** TODO
Theodoridis et al, 2022.
"Understanding and exploiting optimal function inlining"
https://dl.acm.org/doi/10.1145/3503222.3507744
Just found this today. Seems infeasible for a production compiler
mode, but it does have an "autotuner" that seems like it would be
interesting to apply across the Go ecosystem to learn better inlining
heuristics.
It mentions a bunch of inlining heuristics: [7, 20, 23, 24, 28]
I also noticed some citations of PGO-driven inlining, e.g.:
Ayers et al, 1997.
"Aggressive inlining"
https://dl.acm.org/doi/10.1145/258916.258928
Prokopec et al, 2019.
"An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers"
https://ieeexplore.ieee.org/document/8661171
* Fixpoint algorithms
Kim et al, 2019.
"Deterministic parallel fixpoint computation"
https://dl.acm.org/doi/10.1145/3371082
Introduces the idea of a "weak partial order" (WPO), which organizes a
dependency graph into nested directed acyclic graphs (DAGs) of
strongly-connected components (SCCs). Provides an almost-linear time
algorithm to construct a WPO from a dependency graph, and a
deterministic parallel algorithm to compute a fixpoint.
Inserts "widening" points to break all dependency cycles.
Requires the dependency graph to be known in advance.
I've implemented this in Go and verified that it produces the same
result as the current liveness analysis pass's naive
iterate-until-fixpoint algorithm.
Bourdoncle, 2005.
"Efficient chaotic iteration strategies with widenings"
https://link.springer.com/chapter/10.1007/BFb0039704
Widely cited paper about fixpoint calculation with "weak topological
orders" (WTO); these are generalized into WPOs in the above paper,
which allow parallel evaluation.
I haven't actually found a copy to read yet. According to the
introduction section in the preview, they have a way to extend the
algorithm to interprocedural analysis. I'm interested to read that.
Van Es et al, 2020.
"A Parallel Worklist Algorithm for Modular Analyses"
https://ieeexplore.ieee.org/document/9252039
Straight forward worklist algorithm that allows very high
parallelism. Each individual task is farmed out to a separate worker,
which has access to a snapshot of global state. It carries out its
changes and reports back to the coordinator, who merges the results
into the global state, and restarts any workers that are affected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment