-
Only aggregates require stratification
-
We want to aim for the fewest strata possible
-
Aggregates that look the same should always have the same output
-
Aggregates which depend on their own outputs (directly or otherwise) are unorderable
-
Aggregates which filter (rather than provide) the rows contributing to their inputs are orderable
-
Aggregates have four kinds of variables
output = aggregate[input, given: projection, per: grouping]
- output - Must not be bound at the time the aggregate runs
- We can relax this through post-filtering
- input - Must be fully satisfied before the aggregate's value is correct
- We can relax this slightly in the case of grouped aggregates
- projection - Indicate additional uniqueness over the input rows.
- These have identical properties as inputs
- grouping - Indicate the output cardinality of the aggregate
- These don't have any special satisfaction requirements
- output - Must not be bound at the time the aggregate runs
- If an aggregate's output is bound, it may be decomposed into an aggregate with an unbound output and a post-filter (similar to expressions)
- Conceptually, an aggregate maps a set of rows indicated by the projection to a single prefix row indicated by its grouping
- Since the aggregate runs once per unique grouping prefix, the grouped variables do not need to be fully satisfied prior to execution
- Aggregate output will only be correct when the entire set of unique rows for its projection have been stabilized
- This means that the cardinality of the terms in the projection must not change after this point
- Ungrouped aggregates run over the empty prefix
- These rows are the suffix set of unique rows for one of its groupings
- If the rest of the evaluator is incremental, you can just keep issuing updates, but that's pretty expensive
- Block until every variable it is grouped on has been satisfied for a single prefix - Other variables may also be in this prefix, in which case the aggregate will be run multiple times for the same input. Feel free to get clever here
- Accumulate all rows that are suffixes of this prefix (easily done by evaluating the aggregate on its trailing edge)
- Project those down by the aggregate's projection
- Evaluate
- Satisfy the aggregate output variable
- Stratification on aggregates can be viewed as a telescoping generic join
- The prefix portion of the join has a cardinality of C X #GROUPS
- The suffix portion of the join has a cardinality of K X #PROJECTED per group
- To ensure that aggregates always end up operating over the same input, we must run them as early as possible
- Satisfy the cheapest variable of the row, adding it to the prefix
- Find all rows that match the new prefix
- For each row, repeat from 1
- If satisfying this variable completely fills the grouping of an aggregate: A. Compute it B. Satisfy its output variable C. Repeat from 4 D. If the aggregate output variable was initially bound, post-filter E. Repeat from 1
- I don't think you're going to need this, but I wrote it down anyway
-
All nodes depend on any variable they use and do not provide
-
Any node which may provide a variable only does so if no other node has
- If a node that may provide a variable does not do so, it may filter that variable
-
Any node which may depend on a variable does so if it has been provided by another node
-
Node types:
- A Record may provides all of its variables
- An Action provides its entity variable if it is unbound
- A Not may depend on any variable utilized by nodes in its body
- A Union/Choose may provide any variable marked as an output and may depend on any variable utilized by nodes in its body
- Variables marked output which are not provided by the Union/Choose must post-filter
- An Expression/Aggregate may provides any variable marked output or optional
- Variables marked output which are not provided by the Expression/Aggregate must post-filter
-
A node contributes to the cardinality of any variable it may expand or filter
- Pre-sort nodes based on heuristics of your choice (preferably favoring deterministic orderings)
- If no nodes remain, you're done
- For each node with all dependencies satisfied: A. If the node has subqueries, order them B. If the node is capable of providing any un-provided variables, satisfy them C. If the node must provide an already provided variable, split it into itself + a post-filter via a temporary variable D. If a node contributes to a variable, note its contribution E. If all contributors for a contributed variable have been ordered, satisfy it
- If no new nodes have been ordered, the graph is unorderable
- Repeat from 2