Skip to content

Instantly share code, notes, and snippets.

@matthewhammer
Created October 27, 2016 15:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save matthewhammer/420f505b23b34b33248f7500e37547bf to your computer and use it in GitHub Desktop.
Save matthewhammer/420f505b23b34b33248f7500e37547bf to your computer and use it in GitHub Desktop.
Will, Dakota, Allan
20161026 -- Wed
Q: Why clean "aggressively"?
-----------------------------
Informal definition: "Aggressive cleaning", which Adapton (PLDI 2014)
implements, consists of several aspects of the design of the DCG,
which work together to help avoid needless re-evaluation:
-- The DCG stores last-observed values on force edges. For instance,
if a thunk (z) produces 4 when (y) forces it, then the DCG records
an edge from (y) to (z), labeled with the observed value 4.
-- When a thunk with a cached result and dirty edges is forced, the
order of cleaning these dirty edges proceeds in *demand order*
(from changed inputs to the thunks that depend on them). This
order coincides with the creation order the edges in the graph
(each is created immediately after a thunk forces another thunk,
and produces a value).
-- During cleaning, the DCG short-circuits re-evaluation when these
last-observed values match the current value.
Lacking these features in an Adapton implementation will lead to less
reuse. (My understanding is that miniadapton presently does not
implement these features; also, more minorly, miniadapton dirties and
cleans *nodes*, not edges, as in other models and implementations of
Adapton).
For instance, consider this DCG, where input changes from 4 to -4:
[inp] 4 --> -4 (Changed)
|f 4 --> f -4
|
(sq) 16 --> 16 (Unchanged)
|f 16 --> f 16
|
(x) Idea: Don't reexecute x, since 16 is still 16.
. Don't reexecute anything that depends on x, since it hasn't changed.
.
The DCG consists of three nodes (inp, sq and x) and records force
edges x--sq and sq--inp, which we read as "x forces sq" and "sq forces
inp", respectively. We label each edge with the fact that it records
a force (denoted by 'f') and the observed value (4, -4 or 16).
The key benefit to the three DCG techniques listed above is that they
successfully eschew re-evaluating thunk (x), or any of its dependents.
In particular, when inp changes, we dirty these edges (inp--sq, and
sq--x, and so on); later, when (x) is in demand, we clean these edges.
To clean these edges, we begin with inp--sq. Since inp has changed, we
re-evaluate sq, resulting in 16, the same value as before. In this
case, the cleaning algorithm compares the old and new values (16 and
16) and determines that no further re-evaluation is necessary. It
cleans the edge x--sq, and (transitively) any other edges that soley
depend on x.
On the other hand, if we change -4 to -5, then the output of sq does
change (from 16 to 25), and we would next re-evaluate (x).
Still, the techniques above could still come in handy, further below:
If the function computed by (x) over the value of (sq) is (lam x. x <
100), then this function will return true for inp values of 4, -4 and
-5, (since 16 and 25 are both less than 100) so no further
re-evaluation is needed to dependents on (x).
The key idea is that the order of this re-evaluation proceeds from the
nearest dependents on inp to those supercomputations that demand them.
This way, we exploit situations where the old and new results
coincide, and change propagation can "short circuit".
Though not shown above, we also clean edges in left-to-right order,
consistent with their original order of creation. The left-to-right
ordering is important to maximize reuse, but is also critical in the
presence of names, discussed below.
Claim: (Edge-ordering matters for IC with Names)
------------------------------------------------
In the presence of names, an implementation of DCG cleaning must
respect the original "demand order" of the DCG when choosing an order
for cleaning/re-evaluation. If an implementor does not use the
correct order, their system will generally produce incorrect answers.
Example:
Suppose we have the DCG shown below, where [inp] and [temp] are named
references, and (x), (y) and (z) are thunks that read and write them.
The edges are read upwards: Edges marked with an 'f' are read as
"forces", as in "y forces x" and "x forces inp". Edges marked with an
'a' are read as "allocates", as in "x allocates temp".
4 4
[inp] [temp] <---Invariant: temp is equal in value to inp
f\ a /\
\ / \ f
(x) (z)
\ /
f \ / f
(y) Cached result: 4 (equal to inp and temp)
Consider changing inp from 4 to -4. When this happens, we dirty the
edges that point at inp, transitively. To do so, we traverse edges
(backwards) from inp to any super-computation, and mark each edge
dirty that we traverse in this process.
(Aside: We terminate the traversal early when an edge is already
dirty; for instance, if we later changed -4 to -5, we would not have
to re-dirty these edges, assuming we had not yet cleaned them.)
Below, ## means "dirtied" edge. Specifically, the edges y--x and
x--inp are dirty:
-4 4
[inp] [temp] <---Invariant Broken: temp is not equal in value to inp
\ a /\
##f \ / \ f
(x) (z)
\ /
##f \ / f
(y) Cached Result: 4 <-- Stale result
Notice that the invariant about inp and temp is broken: they are not
equal, but that dirty edges witness this inconsistency, albeit
indirectly: Thunk (y) depends on both [inp] and [temp], and thus this
computation and its cached result could be accidentally reused, except
that (y) also has a dirty edge to ensure that we attempt to clean it
before reusing its cached result, the stale value of 4.
After changing inp to -4, suppose that the external user/environment
re-demands the new value of the root thunk y. The following change
propagation "trace" shows in what order we traverse the graph and
clean its edges, "rooted" at node y:
Clean root--y ==> {
Clean y--x ==> {
Clean x--inp ==> { Changed: -4 }
Reeval x ==> {
Observe new value of inp, -4
Re-allocate (overwrite) temp with -4
Dirty z--temp
Dirty y--z
Return ref(temp)
}
Unchanged: ref(temp)
}
Clean y--z ==> {
Clean z--temp { Changed: -4 }
Reeval z ==> {
Observe new value of temp, -4
Return: -4
}
Changed: -4
}
Changed: -4
}
The resulting DCG, post-change propagation, looks as follows:
-4 -4
[inp] [temp] <---Invariant: temp is equal in value to inp
\ a /\
f \ / \ f
(x) (z)
\ /
f \ / f
(y) Cached Result: -4
The DCG shows that inp, temp and the cached results all match (all are
-4). Note that in the trace above, we clean edge y--x before cleaning
edge y--z. In other words, we followed the demand ordering imposed by
thunk (y) on its two dependent sub-computations, (x) and
(z). Following this edge ordering is critical.
Suppose we do it in the other order, cleaning (z) before (x); what
goes wrong? The final return value will be stale, 4, the old value of
inp. Further, the graph will not be completely clean; rather, the
edges y--z and z--temp will be dirty, due to the effects of
over-writing temp with -4 *after* trivially cleaning them prematurely:
Clean root--y ==> {
Clean y--z ==> {
Clean z--temp { Unchanged: 4 }
Reeval z ==> {
Observe stale value of temp, 4
Return 4
}
Unchanged: 4 <<-------------------------XXX Wrong!
}
Clean y--x ==> {
Clean x--inp ==> { Changed: -4 }
Reeval x ==> {
Observe new value of inp, -4
Re-allocate (overwrite) temp with -4
Dirty z--temp
Dirty y--z
Return ref(temp)
}
Unchanged: ref(temp)
}
Unchanged: 4 <<--------------------------XXX Wrong!
}
Unchanged: 4 <<--------------------------XXX Wrong!
}
The resulting DCG will look as follows:
-4 -4
[inp] [temp]
\ a /\
f \ / \ f##
(x) (z) Cached Result: 4 <---Not consistent; should be -4
\ /
f \ / f##
(y) Cached Result: 4 <---Not consistent; should be -4
@AllanGardner
Copy link

The DCG stores last-observed values on force edges. During cleaning, the DCG short-circuits re-evaluation when these last-observed values match the current value.

I see now why you need to keep track of dirtiness per-edge; for example, if x depends on a third node y that changed, it can't be set back to clean until y is examined. But I'm still not clear on why you need last-observed values on the edges; wouldn't per-node be sufficient, under the assumption that a value, if it changes, never changes back to a previously-seen value? (Per-node storage has the advantage of roughly constant overhead relative to the size of the computation)

@fisherdj
Copy link

fisherdj commented Nov 17, 2016

Per-node is incorrect because the value of any node can change multiple times before the nodes depending on it are ever updated.

     (C) (0 -> 1 -> 1)
      /\
    /   \
(A)    (B)

Q: Suppose that A and B depend on C, which has value 0. If C's value is changed to 1, and A is observed but B is not, then C is dirtied again (keeping the value 1), what happens when A and B are observed again with per-node values as opposed to per-edge values?
A: A avoids recomputation (which is correct), but so does B, even though it was computed with C having a value of 0 instead of 1.

If both A and B were immediately forced when C is changed (i.e. if Adapton were more eager in change propagation), this would not happen in the first place.

A few notes about space storage/optimization with per-edge:

  • Weak references can help avoid redundant copies
  • Thus, per-edge has roughly constant overhead for tree-like computations and overhead on the order of the size of the DAG itself
  • Adapton is storing the DAG regardless, so the last point boils down to constant-factor overhead.

One can probably get similar benefits to per-edge by augmenting per-node values with per-edge data (i.e. edge is dirty, edge is clean with same value, edge is clean with different value). This makes cleaning less efficient.

Hope that explains a little more of what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment