mdempsky/gist:63c93ba1b4a1eb0b3fc301d036386a2d

## gistfile1.txt
* Go notes

To fit into Go's current package-oriented compilation mode, we can
define a set of top-down root entry-points into each package.

Any global variable initializations and init functions are always part
of the root.

For package main, the main function is also a root.

For non-main packages, the package's exported objects are its roots.
Similar to the old "crawler" algorithm used by iexport.


Beyond escape analysis, it's probably worth knowing when pointers flow
to an unsafe.Pointer conversion, because that indicates the pointed-to
object can be modified arbitrarily. (Relatedly: go.dev/issue/58625.)

I remember also looking into making string->[]byte or []byte->string
conversions more efficient. go.dev/issue/2205, go.dev/issue/43752.


* Inlining and flow analysis

Callahan et al, 1986.
"Interprocedural constant propagation"
https://dl.acm.org/doi/10.1145/12276.13327

Introduced the idea to use "jump" functions to summarize data flow
within functions, so that they can be wired up into interprocedural
analyses. In Go's separate compilation model, we'd probably write this
out via export data.


Jagannathan and Wright, 1996.
"Flow-directed Inlining"
https://dl.acm.org/doi/10.1145/231379.231417

Waddell and Dybvig, 1997.
"Fast and Effective Procedure Inlining"
https://link.springer.com/chapter/10.1007/BFb0032732

Ashley, 1997.
"The effectiveness of flow analysis for inlining"
https://dl.acm.org/doi/10.1145/258949.258959

A bunch of papers on the idea of using global control flow analysis to
improve inlining. I haven't read and understood them well enough just
yet to really understand their comparisons.


Gilray et al, 2016.
"Pushdown control-flow analysis for free"
https://dl.acm.org/doi/10.1145/2837614.2837631

Recent control flow analysis algorithm, which claims significant
improvements in precision and performance. Seems interesting to read
into further.

Though the "efficient" O(N^3) figure mentioned in the abstract is a
bit concerning.


** TODO

Theodoridis et al, 2022.
"Understanding and exploiting optimal function inlining"
https://dl.acm.org/doi/10.1145/3503222.3507744

Just found this today. Seems infeasible for a production compiler
mode, but it does have an "autotuner" that seems like it would be
interesting to apply across the Go ecosystem to learn better inlining
heuristics.

It mentions a bunch of inlining heuristics: [7, 20, 23, 24, 28]

I also noticed some citations of PGO-driven inlining, e.g.:

Ayers et al, 1997.
"Aggressive inlining"
https://dl.acm.org/doi/10.1145/258916.258928

Prokopec et al, 2019.
"An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers"
https://ieeexplore.ieee.org/document/8661171


* Fixpoint algorithms

Kim et al, 2019.
"Deterministic parallel fixpoint computation"
https://dl.acm.org/doi/10.1145/3371082

Introduces the idea of a "weak partial order" (WPO), which organizes a
dependency graph into nested directed acyclic graphs (DAGs) of
strongly-connected components (SCCs). Provides an almost-linear time
algorithm to construct a WPO from a dependency graph, and a
deterministic parallel algorithm to compute a fixpoint.

Inserts "widening" points to break all dependency cycles.

Requires the dependency graph to be known in advance.

I've implemented this in Go and verified that it produces the same
result as the current liveness analysis pass's naive
iterate-until-fixpoint algorithm.


Bourdoncle, 2005.
"Efficient chaotic iteration strategies with widenings"
https://link.springer.com/chapter/10.1007/BFb0039704

Widely cited paper about fixpoint calculation with "weak topological
orders" (WTO); these are generalized into WPOs in the above paper,
which allow parallel evaluation.

I haven't actually found a copy to read yet. According to the
introduction section in the preview, they have a way to extend the
algorithm to interprocedural analysis. I'm interested to read that.


Van Es et al, 2020.
"A Parallel Worklist Algorithm for Modular Analyses"
https://ieeexplore.ieee.org/document/9252039

Straight forward worklist algorithm that allows very high
parallelism. Each individual task is farmed out to a separate worker,
which has access to a snapshot of global state. It carries out its
changes and reports back to the coordinator, who merges the results
into the global state, and restarts any workers that are affected.
	* Go notes

	To fit into Go's current package-oriented compilation mode, we can
	define a set of top-down root entry-points into each package.

	Any global variable initializations and init functions are always part
	of the root.

	For package main, the main function is also a root.

	For non-main packages, the package's exported objects are its roots.
	Similar to the old "crawler" algorithm used by iexport.


	Beyond escape analysis, it's probably worth knowing when pointers flow
	to an unsafe.Pointer conversion, because that indicates the pointed-to
	object can be modified arbitrarily. (Relatedly: go.dev/issue/58625.)

	I remember also looking into making string->[]byte or []byte->string
	conversions more efficient. go.dev/issue/2205, go.dev/issue/43752.


	* Inlining and flow analysis

	Callahan et al, 1986.
	"Interprocedural constant propagation"
	https://dl.acm.org/doi/10.1145/12276.13327

	Introduced the idea to use "jump" functions to summarize data flow
	within functions, so that they can be wired up into interprocedural
	analyses. In Go's separate compilation model, we'd probably write this
	out via export data.


	Jagannathan and Wright, 1996.
	"Flow-directed Inlining"
	https://dl.acm.org/doi/10.1145/231379.231417

	Waddell and Dybvig, 1997.
	"Fast and Effective Procedure Inlining"
	https://link.springer.com/chapter/10.1007/BFb0032732

	Ashley, 1997.
	"The effectiveness of flow analysis for inlining"
	https://dl.acm.org/doi/10.1145/258949.258959

	A bunch of papers on the idea of using global control flow analysis to
	improve inlining. I haven't read and understood them well enough just
	yet to really understand their comparisons.



	Gilray et al, 2016.
	"Pushdown control-flow analysis for free"
	https://dl.acm.org/doi/10.1145/2837614.2837631

	Recent control flow analysis algorithm, which claims significant
	improvements in precision and performance. Seems interesting to read
	into further.

	Though the "efficient" O(N^3) figure mentioned in the abstract is a
	bit concerning.


	** TODO

	Theodoridis et al, 2022.
	"Understanding and exploiting optimal function inlining"
	https://dl.acm.org/doi/10.1145/3503222.3507744

	Just found this today. Seems infeasible for a production compiler
	mode, but it does have an "autotuner" that seems like it would be
	interesting to apply across the Go ecosystem to learn better inlining
	heuristics.

	It mentions a bunch of inlining heuristics: [7, 20, 23, 24, 28]

	I also noticed some citations of PGO-driven inlining, e.g.:

	Ayers et al, 1997.
	"Aggressive inlining"
	https://dl.acm.org/doi/10.1145/258916.258928

	Prokopec et al, 2019.
	"An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers"
	https://ieeexplore.ieee.org/document/8661171


	* Fixpoint algorithms

	Kim et al, 2019.
	"Deterministic parallel fixpoint computation"
	https://dl.acm.org/doi/10.1145/3371082

	Introduces the idea of a "weak partial order" (WPO), which organizes a
	dependency graph into nested directed acyclic graphs (DAGs) of
	strongly-connected components (SCCs). Provides an almost-linear time
	algorithm to construct a WPO from a dependency graph, and a
	deterministic parallel algorithm to compute a fixpoint.

	Inserts "widening" points to break all dependency cycles.

	Requires the dependency graph to be known in advance.

	I've implemented this in Go and verified that it produces the same
	result as the current liveness analysis pass's naive
	iterate-until-fixpoint algorithm.


	Bourdoncle, 2005.
	"Efficient chaotic iteration strategies with widenings"
	https://link.springer.com/chapter/10.1007/BFb0039704

	Widely cited paper about fixpoint calculation with "weak topological
	orders" (WTO); these are generalized into WPOs in the above paper,
	which allow parallel evaluation.

	I haven't actually found a copy to read yet. According to the
	introduction section in the preview, they have a way to extend the
	algorithm to interprocedural analysis. I'm interested to read that.


	Van Es et al, 2020.
	"A Parallel Worklist Algorithm for Modular Analyses"
	https://ieeexplore.ieee.org/document/9252039

	Straight forward worklist algorithm that allows very high
	parallelism. Each individual task is farmed out to a separate worker,
	which has access to a snapshot of global state. It carries out its
	changes and reports back to the coordinator, who merges the results
	into the global state, and restarts any workers that are affected.