First, normalization and optimization are two different things. Normalization, which has various sub-definitions, depending on the exact properties involves approaching some “normal form” that reduces the variety of the structure at hand. It can often pessimize a structure, but does so in a predictable way. A normalized structure is easier to apply other operations to, because it is restricted in its form.
Optimization, on the other hand, denormalizes in exchange for better performance. But an optimized structure is harder to reason about and make changes to.
So, the general approach is to normalize a structure, apply whatever transformations are appropriate, then optimize the structure. If you are converting between structures, usually only the final structure would have optimization applied, but normalization of each structure can be useful.
Often, many transformations need to be applied. There are various approaches of how to implement them (see Efficient Nanopass Compilers using Cats and Matryoshka for some examples). In the nanopass approach, they can be made extremely fine-grained, so what is a reasonable level of granularity?
I think we can let the types guide us.
Assuming we’re dealing with a coproduct of pattern functors, we’ll end up with types like the following:
|component| context needed |
| scope | no | yes |
|---------|--------|--------------------|
| within | F ~> F | F[T[G]] ~> F[T[G]] |
| across | F ~> G | F[T[G]] ~> G[T[G]] |
“within” indicates transformations that operate within a single coproduct component, whereas “across” indicates transformations that may return a result in a different component. “context needed” indicates whether we need to look at the inputs in order to apply the transformation.
Transformations involving G
should be partitioned by the exact context necessary. E.g., (implicit F: F :<: G, OC: OtherComponent :<: G)
would be distinct from (implicit AC: AnotherComponent :<: G)
.[†]
In general, these transformations should also return the result in Option
, to indicate an unchanged value. This allows repeated application until the result is in normal form.
[†]: This part is still in flux. When using a type class for a transformation, each instance can set its own constraints on the context, but there may be different options depending on exactly what coproduct it’s being applied to. To some extent, this may be solvable by ordered instances (i.e., when the contexts are strictly orderable, but not when they are overlapping).
I see a couple points to make regarding the order in which normalizations and optimizations are applied.
I see normalization not as a single transformation, but as many fine-grained transformations, each with some optimizations in mind. Meaning, we normalize in some particular way, because it makes it easier to run optimations X, Y, and Z. Because of that relationship, we necessarily need to run that particular normalization before its related optimizations.
Now, optimizations (and normalizations) may also need to be applied in certain orders. I would love to somehow encode these requirements in ordering in the types, but I'm not sure how to do that. Maybe we can have a single "within" normalization followed by the "within" optimizations for each component. And then maybe the ordering of the "within" blocks wouldn't matter. And it's only the ordering of the "across" blocks that matters? That would be nice!
Also, as we add more optimizations, I wouldn't be surprised if we hit more infinite loops like the one we saw recently, where we optimize recursively, oscillating forever between two different forms. So, a design with good debugging tools in mind will be important, which, at least for me, means the ability to easily print out the qscript at various stages in the transformation.