Created
April 12, 2023 13:43
-
-
Save lotuc/66474ac6d4643c21e84cc8d5a36c305a to your computer and use it in GitHub Desktop.
Transcripts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; | |
;; Seven Implementations of Incremental ;; | |
;; https://www.youtube.com/watch?v=G6a5G5i4gQU ;; | |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; | |
0:03: so I want to give you guys a talk about a library that we use internally | |
here called incremental and only some of you | |
0:12: have actually used incremental so I'll talk a little bit about what | |
incremental | |
0:17: is and how it works and what problems is trying to solve but most of the | |
talk is really going to focus on the long and | |
0:23: winding road from the initial implementation of incremental that we | |
started with and the one that we've ended up with now and it's sort of | |
0:31: interesting story because it's a lots of different concerns got mixed up | |
there there are the the the ideas behind the | |
0:39: incremental come out of academic work on functional programming and | |
algorithms | |
0:44: and stuff like that and there's a bunch of stuff that came from those | |
papers that we screwed up and that the | |
0:50: academics got right in our initial versions and there's a bunch of things | |
that we got right that the academics didn't pay any attention to and kind of | |
0:56: the sort of several year long history of this project it's kind of an | |
interesting back and forth kind of between what you | |
1:04: get from the kind of standard academic approach to looking at these | |
problems and what you get when you're trying to | |
1:09: use it for real work which is the experience that we have here ok so let | |
1:14: me say a little bit about what incremental is what it's useful for and the | |
basic goal of incremental is to | |
1:21: allow you to build big complicated computation computations that depend on | |
1:27: lots of data that are efficient and are specifically if you're in the | |
sense that when only small amounts of your input | |
1:34: data change you want to be able to kind of efficiently refresh that | |
computation and get a clean version of the output | |
1:41: out of the end of it that's the the kind of fundamental goal of | |
incremental and it's in many ways similar to what you | |
1:48: would to the computational model that you expect from a spreadsheet we | |
don't normally think of spreadsheets as | |
1:54: programming languages but they kind of are they're different from the | |
programming languages we're all used to | |
1:59: there it's interestingly kind of a functional programming language after | |
all what do you put into a cells of a | |
2:06: spreadsheet other than simple expressions that you know are just a pure | |
computation based on some other set | |
2:11: of inputs and the the the frecchi world is kind of inverted | |
2:17: what we think of from ordinary programming an ordinary programming code is | |
king right that's what you look at | |
2:22: and then I yeah there's some data it operates on in the background you | |
don't that's not so visible and in spreadsheets it's the reverse the data | |
2:28: is the primary thing you see and then the logic of the program is kind of | |
hidden in those cells that are a little | |
2:34: harder to look at but separating from that part of it the other | |
interesting thing about spreadsheets is that they | |
2:40: give you a way of expressing a computation that's structured as | |
essentially a dependency graph every | |
2:46: equation that's in a Cell has some other set of inputs from which its | |
value is derived and Excel is very sensitive to | |
2:53: this structure and tries to be efficient so that if you have some really | |
big complicated spreadsheets and we have | |
3:00: really big and complicated spreadsheets at James Street Excel does a lot | |
of work to update those | |
3:06: efficiently so if some of it changes some of the cells are modified then | |
Excel only refires the part of the | |
3:12: dependency graph that needs to be referred and so this turns out to be a | |
3:17: very effective programming paradigm both because it's a nice way of | |
building these efficient algorithms that without | |
3:24: a lot of work to do so and the other thing that's good about is it also | |
gives you a natural way to build | |
3:30: visualizations of what's going on one of the great things about | |
spreadsheets is you can plug things that you can sort of | |
3:36: create a little monitoring sheet which has references to various | |
intermediate particular computations and displays | |
3:41: them in one place and this is kind of easy and natural and efficient to do | |
because the intermediate parts of your | |
3:47: computation are ratified they have a kind of explicit place in the world | |
that you can refer to and that makes all | |
3:53: sorts of other things that you want to do later easier and incremental is | |
going to give us a version of that as well but | |
3:58: within the bounds of o'connell so I | |
4:04: guess what I want to start doing is start laying out in a little bit more | |
detail what really precisely is the | |
4:10: computational model that incremental gives you and I'm going to do that by | |
starting by bringing up the interface | |
4:16: the Oh camel interface that looks something like the the interface of the | |
first version of incremental that we built | |
4:25: let's make that fit the screen so obviously it's easier to understand this | |
4:32: if you are comfortable looking at oh camel signatures but I'm gonna kind | |
of go through piece by piece and trying to | |
4:37: unpack what the meaning of the different parts are so you can ignore the | |
module type s at the beginning and end that's | |
4:43: just a kind of wrapper to give a name to this signature but the there's | |
this main | |
4:49: module anchor so that's the incremental itself and it's a parametrized | |
type you | |
4:55: can have an inquiry e that contains within an integer or a floating point | |
value or a string or an array of strings | |
5:02: or whatever kind of structure type you want can sit inside that tick a of | |
the incremental and then we have a bunch of | |
5:09: basic operators that you can use for constructing new incrementals based | |
on | |
5:14: the incrementals you already have in your hand so the simplest one is | |
return so return does in some sense the minimal | |
5:22: thing you could imagine doing which is return takes an arbitrary value and | |
gives you back an incremental computation that is always equal to that | |
5:29: value like another natural name for this function is Const right kind of | |
returns the constant computation map is another | |
5:38: interesting one map takes in incremental of some type a and a function | |
from A to B and gives you a new incremental whose | |
5:45: contents is of type B so I'm gonna start drawing pictures on the wall so | |
you can imagine you start with something that | |
5:51: has is in it and you call map and it gives you a new one which has B's in | |
it | |
5:56: and there's some function f that takes you from A to B and if you want to | |
think | |
6:01: about kind of what is like the living breathing graph of this computation | |
we're gonna start seeing those appear on | |
6:06: the wall so here you imagine every time this a changes its going to cause | |
this function f to refire to recompute the B | |
6:14: ok so far so good all right so this this already gives us a little bit of | |
6:20: expressive power we can now start taking these this map and start growing | |
the | |
6:27: computations by doing lots of maps at different | |
6:34: points in the hierarchy it's it's worth noting however there's something | |
quite | |
6:39: limiting about this we can add new nodes and branch out but we have no way | |
yet of pulling back in right so that's not the | |
6:46: most exciting kind of computational graph if for nothing else like if you | |
all start this with one thing and then this thing fires like everything's good | |
6:53: to change it's kind of boring that way but we can we can make it a little | |
bit more interesting by adding functions | |
6:59: that bring things back together so map two is the first one so what is the | |
signature for map two is a little harder | |
7:04: to parse but math two takes two incrementals and a incremental and a B | |
incremental and then a function that | |
7:10: takes an A and a B and return to C and it gives you back a C incremental | |
right | |
7:15: pretty straightforward and we can use that to start bringing things back | |
together all right so we can make lots | |
7:23: of interesting sub computations and then we can merge them together into | |
something if we want and in fact once | |
7:28: you have map two you effectively have map three in map four and that five | |
by just combining multiple map twos | |
7:34: together so you know map gave us like a real new bit of expressivity and | |
map to give us another one and from there | |
7:41: there's a bunch of other things you can build on top of it and the world | |
that you're in now when you just have these operators is | |
7:47: essentially the world where you can build more or less arbitrary static | |
7:53: dependency graphs right just directed acyclic graphs so you can add more | |
nodes and keep on growing them and branch out | |
7:59: and come back in and you can build basically an arbitrary dag but the dag | |
is in some sense static I | |
8:06: mean it's not totally static in the sense it here I can like call these | |
functions and add new nodes but it is static with respect to the input data | |
8:13: right so if I have this input here and I change it no matter what it will | |
never | |
8:20: change the structure of the dependency graph and in fact I can have | |
multiple different inputs right if you have lots | |
8:28: of inputs that feed into this graph and that doesn't really change | |
anything it's still static in the sense that we | |
8:34: described before but we can make it dynamic and that's what the next | |
operator does bind and bind is actually | |
8:42: subtly different from map and it's easy to not even notice the difference | |
but it turns out the difference really matters | |
8:48: a lot so the signature of bind is it takes in a incremental and a function | |
8:54: that takes in a and returns a B incremental and then it returns to you all | |
in a new be incremental right so | |
9:01: it's like the only difference between bind and map is the extra T that | |
shows up right here but otherwise they are | |
9:07: exactly the same but that small one letter difference it turns out has a | |
lot | |
9:13: of expressive power and the thing it's going to give us is dynamism the | |
ability to change the computations as we go so | |
9:19: the way you can visualize a bind node | |
9:25: well a by node has a left hand side and | |
9:30: then a right hand side so the idea is the left hand side of the bind node | |
is the first input right the thing that's | |
9:38: again sums it's the input to the bind and then the right hand side is | |
going to be what's produced by this function here so the idea is you when you | |
create a by | |
9:46: node it has an input where that's this left-hand side and then depending | |
on | |
9:51: what this what this is it can pick any of a number of different nodes to | |
wire | |
9:58: in as the right-hand side and it can change its minds as it goes because | |
it can choose different incrementals | |
10:04: depending on what the input is and in fact as we'll see you can in fact | |
even create new incrementals in that context | |
10:11: so this is all a little abstract let me make this difference a little bit | |
concrete more concrete by looking at how | |
10:19: you might implement if so this is a tiny | |
10:26: piece of code using the incremental interface I just show you which | |
implements if in two different ways once | |
10:32: with map and once with bind and it will see even though they produce the | |
same answer they have different graphs and | |
10:39: therefore different performance behaviors so this is kind of nonsense | |
10:44: this is just me implementing map three on top of map two I told you I | |
could do before that's actually the code to do it | |
10:50: it's not super interesting but it just makes the rest of this work and | |
then my first implementation of map if | |
10:55: first implementation if is going to use map in particular this map three | |
so I'm | |
11:01: going to call map three on a variable representing the condition like the | |
boolean true or false the then Clause | |
11:08: and the else Clause and then what am I going to do with the condition | |
then in the else I'm going to say if the | |
11:13: condition is true then return T else return e so one thing that's | |
important | |
11:19: but kind of subtle is well what's the type of C here raised a bool | |
incremental | |
11:30: and what's the type of C here or here it's just a bool that's right so | |
even | |
11:36: though we're using where someone abusing the terminology hearing we're | |
using the same variables in different places it's the kind of unwrapped aversion | |
that we | |
11:42: can then use for a straight-up computation and so the dependency graph | |
here is actually really simple all we do | |
11:48: is basically have a node with three inputs the condition the then and the | |
11:58: else and any time any of them changes we recompute we say if it's true | |
return T | |
12:04: else return e right and what's the only part of this that's problematic | |
is the | |
12:10: performance behavior which is to say let's say this condition changes | |
rarely and if and let's say it's currently true | |
12:18: then in some sense once you know this you know that you don't actually | |
care about what's going on here you don't | |
12:23: really need to refire if statement when this changes only when this | |
changes but the map if implementation is not aware | |
12:29: of that so recomputed every single time when either branch changes no | |
matter what the condition is so the bind | |
12:34: implementation here has the other behavior so if we're bind if we bind on | |
C on the condition and then if it's true | |
12:43: then we return T otherwise we return E but in this case T and E are | |
incrementals right so what bind is doing | |
12:50: in this case is choosing between two possibilities so in the bind case | |
instead of having all three wired in we | |
12:56: have either the true case wired in or we have the else case that other | |
than or | |
13:01: the else case wired in but never both okay so that that maybe gives you a | |
little bit more of a sense of the | |
13:08: division map and bind do you have any questions about this it's easy to | |
get confused it's like an | |
13:15: interesting in quite different library so ok so the I guess the other | |
thing I | |
13:20: want to say about bind is this use of bind in some ways under cells how | |
important bind is because it makes it | |
13:26: sound like just a small optimisation about the firing structure of the | |
resulting graph but it's actually much | |
13:32: it's a much bigger deal than that because in addition just changing if | |
you where you're pointing to amongst to | |
13:37: existing incrementals you can also do all sorts of new things you can | |
construct new incrementals by calling | |
13:44: map and bind inside of the right hand side of a bind so for example you | |
could have some computation where you know | |
13:52: there's an infinite number of possible configurations and you bind on | |
some description of the configuration you | |
13:57: want now and the right-hand side constructs the computation for you like | |
for example there might be some big sum | |
14:03: product you want to do have a bunch of inputs which ones are they well | |
that might come in on the right hand side of the bind and then you construct the | |
14:09: whole sub graph that does that computation for you so bind really does a | |
lot to add to the expressive power the | |
14:15: system letting you do pretty general-purpose dynamism in the middle of | |
your incremental computation yeah | |
14:27: this is this is a good point so and this is also this the whole structure | |
incremental is what's called a monad | |
14:33: it's a monadic lee structured library and there's a standard trade-off | |
between map and bind in libraries like this | |
14:40: which is map is generally speaking simpler to use simpler to understand | |
and simpler to implement and bind is more | |
14:48: powerful but worse in all the other ways and that's true here but it | |
certainly adds a lot of complexity to the | |
14:53: implementation of incremental as we'll see you in a second and it's also | |
slower because it involves | |
15:00: in many cases the allocation of new nodes and more heavyweight operations | |
so yeah and I mean it's worth saying these | |
15:07: are all baby cases and you can't really get a good understanding of why | |
these things are useful by just looking at baby cases so for example here's one | |
15:14: optimization to this don't do it incrementally at all like an if | |
statement is a tiny amount of work and | |
15:19: incremental izing it isn't all that valuable in the different team map | |
and doesn't matter very much so like in the very small this is kind of in some | |
sense | |
15:26: all not very interesting but when you get to bigger more complicated | |
things is where it matters and you have to worry | |
15:31: when you think about bind about exactly this problem of how expensive it | |
is and in fact a lot of the practical | |
15:38: experience of using incremental is you sort of look at you have to look | |
at your program in two ways one is what is the | |
15:44: thing it's trying to do and that's in some sense easy you just kind of | |
drop all the bind in maps just forget about | |
15:50: every incremental operation and see what it does is in all at once | |
program and that's its semantics that's its meaning | |
15:55: but when you want to think about performance you have to think about | |
exactly where the increment allottee lies and you actually think quite hard | |
16:01: about the diverging binding map because it makes a real difference all | |
right so | |
16:07: far so good so let's so we talked about | |
16:18: the basic interface a little bit then he's kind of sort of remind and | |
mention a few other points so as I said the | |
16:25: basic operations return map and map to those only let you build static | |
DAGs bind gives you dynamism an important | |
16:33: part of this is that we're really thinking about this as a way of | |
optimizing a functional program a | |
16:39: functional program saying there's just computing and not doing side | |
effects and that makes everything easier to think about sometimes you'd want to | |
do side | |
16:46: effects in these and one can talk about that but I'm mostly going to | |
ignore that for the purposes of this talk one other | |
16:51: thing that I didn't mention is this notion of cut off which if you have a | |
big complicated graph a lot of the time | |
16:59: you might want to stop the computation earlier than you would imagine if | |
you | |
17:04: just like see what changed and then like fire the transitive closure | |
everything that is reachable from the thing you | |
17:10: modified because sometimes even though something changed the the output | |
three | |
17:16: stages down might not change right imagine for example you have some node | |
that computes the max of two numbers if | |
17:21: the smaller number changed well it's going to stop right there and you | |
really | |
17:26: want some notion of cutting off the computation when you detect that the | |
input hasn't changed or in some cases | |
17:32: hasn't changed by enough if you're computing real valued things point | |
numbers you might want to have | |
17:37: some tolerance we say ah you know less than 10 to the negative 15th I | |
don't care about differences that small you | |
17:43: might want to cut off a computation in some ways so you really need | |
support for cut offs and a key one other thing is I | |
17:48: forget there's a couple other parts of the interface I didn't mention | |
that are important so everything I've talked | |
17:55: about I kind of stopped it bind right so everything I talked about was | |
about constructing these computations you know | |
18:00: assume that you have some incrementals to begin with but I didn't tell | |
you how to use the computation I didn't tell you how to get incrementals to | |
begin with so | |
18:07: let's talk about that so stabilize is a somewhat mysterious looking | |
function it takes unit returns | |
18:12: unit what does it do so what it actually does is it runs the whole | |
computation so | |
18:19: the workflow with an incremental computation is you build the basic | |
computation and you modify some of the | |
18:27: inputs and then you call stabilize to flush that through the graph and | |
then we get the outputs back out so how do we | |
18:33: get the outputs out well we have this function value which will take an | |
incremental and spit back the value at | |
18:38: you and there's another function which is very useful called an update | |
which registers a call back it tells you when | |
18:45: something has changed and this goes back to the use case I was talking | |
about before where you might have some big computation where you want to put | |
your | |
18:51: finger in multiple spots so you can display things from it or otherwise | |
in other ways react to it and on update | |
18:57: lets you kind of register your interest in some value and wait for it to | |
come back to you and now we have everything | |
19:03: except we don't know how to put data into the system yet and that's a | |
variable so that's the last piece of the | |
19:09: interface so he variable is again a parameterised type you can create one | |
with an initial value you can set it | |
19:15: meaning you can change the value and then you can call read and what read | |
does is it gives you back the incremental corresponding to that value | |
19:22: so you start building an incremental computation by creating a bunch of | |
variables reading off the incrementals | |
19:28: from that those variables and then using the rest of the Combinator's | |
that we have for building up the computation | |
19:34: itself alright any questions about this interface if you don't understand | |
this | |
19:40: interface you're hosed for the rest of the talk so we should ah yes so | |
the rough story is this if you | |
19:47: if you do everything in a beautiful functional style which we often don't | |
then you're basically guaranteed to not | |
19:54: have cycles if you do some clever things to optimize like you try and | |
like | |
19:59: memorize computations that you do a lot so that you don't have to rebuild | |
them then it's easy to come up with cycles | |
20:05: and you have to figure out what to do about that that's actually a | |
serious issue and we'll talk about how it deals | |
20:10: with cycles and interesting and different between different | |
implementations that we went through okay all right let's get back to my high | |
20:20: quality slides so I'm going to go through a bunch of different versions | |
20:26: some and more some in less depth I'm probably gonna talk about this one | |
in the most depth because it's the easiest to understand this is the first | |
version | |
20:32: that we did that's that Stephon weeks in Milan study Avis wrote when | |
years ago at | |
20:38: this point we decided we wanted a version of this so the first thing I | |
want to talk about is what is the | |
20:44: stabilization function and because that's kind of the key thing that we | |
have it's the way in which you propagate | |
20:49: all the information so this computation graph I've been describing and | |
drawing on the board you more or less make that | |
20:56: real in Oh camel you know objects which you know you allocate them and | |
they have pointers to each other in a way that | |
21:01: roughly reflects the structure of this dependency graph one important | |
thing to | |
21:07: understand about it is you have to have pointers going from the variables | |
up right when you represent a graph like | |
21:14: this is not obvious which direct you need the pointers but if you think | |
about the performance characteristics you want out of this for the work that | |
we're | |
21:20: doing we want to be able to go off why is that because the new | |
information comes in at the leaves and you want to | |
21:27: do the minimal amount of work that you have to do based on which of the | |
leaves have changed so the basic structure of | |
21:32: how any algorithm to do this kind of has to work is some of your leaves | |
are dirty because they've been set you go through | |
21:39: and do the propagation starting from there and stop when you don't need | |
to go any further and so that requires that | |
21:46: from believes you know where they connect to so you need these upward | |
pointers so the the algorithm is | |
21:53: trickier than you might imagine if you haven't thought about it at all | |
because because of what happens when you | |
21:59: have as you often do recombinant graphs meaning the graph fans out and | |
then comes back in all right so let's draw a | |
22:06: simple example of a recombinant graph so | |
22:20: you have to think a little bit hard about how you're going to fire this | |
structure because let's think we just do | |
22:27: a simple depth-first search right we go if you're zero this has changed | |
so this has to be recomputed so this has to be | |
22:32: recomputed so that has to be recomputed okay we're done with that okay so | |
since this teams this also has to be recomputed so oh wait now we have to | |
22:40: recompute this one twice right so you think well maybe it's not so bad | |
you have to do something twice how big of a | |
22:46: deal is that but once you get twice you basically have exponential | |
badness because you can take examples like this | |
22:53: and kind of stack them on top of each other and at each level of the | |
stack like your one update becomes 2 becomes 4 | |
22:59: becomes 8 becomes 16 so you get a kind of exponential blow-up of | |
unnecessary work if you just do a completely naive | |
23:06: walk of the tree and you might think oh maybe I should do breadth-first | |
search so I get to this thing at the same time | |
23:11: but I mean I trick you already because look I made two links here and | |
only one here so now you won't reach it at the | |
23:18: same time so you need actually something more principled than just like | |
changing the order of the search so what's the | |
23:24: solution here the solution is to do it in two passes pass one is you | |
figure out | |
23:29: what needs to be recomputed you basically walk through the graph and | |
imagine like there might be other parts of this computation that that don't | |
23:36: depend on this node and don't need to be refire right you can write to | |
sort of think about this world here like this | |
23:44: and all of this needs to be redone but none of these nodes do so one | |
thing you can do is just do a first pass where you | |
23:50: mark all you mark the whole transitive closure of this node as dirty and | |
then | |
23:56: when you're done you then do the actual propagation of the firing but the | |
key | |
24:01: trick here is when you mark nodes as dirty you also let them know how | |
many of their inputs are dirty and then they | |
24:07: don't recompute until the number of dirty | |
24:13: inputs they have drops to zero right so here if you do the same I think | |
we did | |
24:18: before we would walk up here and then we've kind of tell this guy that | |
one of his inputs was clean but it wouldn't | |
24:23: fire yet because it knows it has one more it has to still wait and then | |
we go back and start going this way and then | |
24:29: finally when we get here then this guy knows he needs to fire and then he | |
recomputes and causes the next level up | |
24:35: to recompute okay so that's the sort of basic algorithm it's very simple | |
it's | |
24:42: quite fast it's easy to understand everything's great or well not | |
24:47: everything is great it turns out so the first problem we have has to do | |
with garbage collection and it turns out lots | |
24:53: of pretty functional programming abstractions that are built are built on | |
top of horrible imperative complicated | |
25:00: messy underneath that are hard to understand and one of the big reasons | |
they're hard to understand is how they interact with a garbage collector so the | |
25:07: particular problem here it has to do with the direction of the pointers | |
so we have pointers up from the leaves which | |
25:13: means as long as the leaves are alive everything they point to will | |
remain alive and you might think well okay that | |
25:21: seems like a reasonable property but you don't want to keep things alive | |
because they're connected to inputs you want to | |
25:26: keep these alive because someone is paying attention to them on the | |
output side and in particular if you have in | |
25:32: the middle of this graph something dynamic where there's a bye node that | |
is allocating new nodes and like forget and | |
25:37: dropping references to nodes that it created previously well those nodes | |
are created previously are still hooked in up from the leaves | |
25:45: so if you don't do anything clever at all they're never actually ever | |
going to get collected so you have to do | |
25:50: something to take things that have been there are no longer observable in | |
this | |
25:55: system and drop although all the pointers to them from the inputs which | |
have to remain alive because you know | |
26:01: somebody is feeding data in from the outside into those things so those | |
are going to be kept alive so we need some | |
26:08: way of figuring out what's going on and then the way we do this is by | |
maintaining reference counts from | |
26:15: externally held nodes right so we're gonna keep track of whenever someone | |
has a node that they hold to on the outside | |
26:20: we'll do a reference count of the num of times you are referenced from | |
some externally held node and if that | |
26:25: reference counts drops to zero then we're going to cause you to drop any | |
key to cut yourself off from the graph so | |
26:31: you can be collected okay so that's relatively simple the only issue is | |
how | |
26:37: do you keep track of whether you have external references all right | |
because after all none of the stuff is ever | |
26:42: going to get collected so the garbage collector isn't going to tell you | |
anything so how do you figure it out so it's pretty easy the basic idea is | |
26:48: instead of handing back these actual nodes that are in the internal graph | |
back to outside users you create an | |
26:54: extra little node for each one of these called a sentinel and the | |
sentinel has is just an extra cell that points to the | |
27:01: real thing and isn't pointed to by anything else in the graph um and what | |
27:07: you do is you attach to every sentinel what's called a finalizar and a | |
finalizar is just a hook in the garbage collector | |
27:13: that right before it collects an object it warns users oh this thing's | |
about to be ripped out and you use the | |
27:20: finalization of the sentinels as the way to trigger the removal of the | |
reference counts basically the things that you're | |
27:26: counting references from is the sentinels and so it's their arrival and | |
removal that causes you to update the | |
27:32: reference counts okay so far so good does that make sense | |
27:38: all right so everything's great it's a wonderful implementation there are | |
no problems right I totally about the | |
27:45: stencils already so there's one small problem with this design I told you | |
27:51: which is I talked before about how important menial to cut off the | |
computation is and it's impossible here so that's awkward so why is it | |
27:58: impossible because I have this two pass algorithm like the first pass | |
marks what needs to be recomputed and the second | |
28:04: pass recomputed and never did I say some way of figuring out whether you | |
needed to be recomputed the the first pass | |
28:10: where you mark what needs to be recomputed is totally naive right you | |
can't look at the values because they haven't been computed yet so you just | |
28:16: mark everything that's in the transitive closure is needing every | |
computation so if you have things that are important to | |
28:21: cut off there's just no story here for how that's going to work so we | |
came up with a story which is a little bit of an | |
28:27: awkward story we said well probably like when you create a node that's | |
explicitly | |
28:32: meant to be a cut off node we can probably make the assumption that | |
almost always it cuts off right that's | |
28:38: why we did and then and then if we do that well what we can do is like | |
this what we're going to run this algorithm to its completion under the | |
assumption | |
28:46: that all cut-offs don't propagate but we're going to keep a list of the | |
cut-offs that we fired and at the end | |
28:52: we'll check and if any cutoff node changed by enough that the cutoff | |
function says it really should be fire | |
28:58: then we just kind of restart the algorithm again from there with a round | |
of cutoff nodes so this sounds like | |
29:04: it'll work and it does work but it's also bad because if you have | |
complicated nested cut-offs in your system well you | |
29:12: could refire exponentially often right it has the same problem we had | |
before of the double fire so we need to be two to | |
29:19: our credit like we knew from the beginning that this was bad and so we | |
just used cut-offs in limited cases kind | |
29:25: of at the edge of computations and more complicated internal stuff we | |
said well we're just going to live without cut-offs so that was a big problem | |
that | |
29:33: the system had it has another quite significant problem that took us | |
longer to figure out and this has to do with | |
29:39: our naive idea about how the garbage collection of unnecessary bits of | |
the | |
29:44: graph is going to work so remember we talked about like by nodes can | |
allocate new things and the old things are now no | |
29:50: longer referred to but are still hanging around and connected to the | |
inputs until the garbage collector collects their | |
29:56: sentinels so we can update the reference counts and figure out that the | |
things can be removed right so this sounds | |
30:03: reasonable but relying on the garbage collector to collect something well | |
you shouldn't really rely on it to | |
30:11: collect it especially soon right the garbage collector collects things | |
when it thinks it's running out of memory it | |
30:16: could be really a long time before it wants to collect it and in the | |
meantime the thing that wasn't obvious to us at | |
30:21: first is these nodes that are hanging around are still like they're still | |
alive they're still kicking it's not | |
30:27: just there's extra memory they're hooked into the inputs they're still | |
computing they're running every time inputs fire | |
30:33: they're still firing so that seems a little bad until you think about it | |
more and realize it's horrific ly bad because | |
30:40: if you have nested sets of binds this is happening over and over and over | |
again | |
30:45: and like at each level you're like splitting yourself and splitting | |
yourself and splitting yourself so you're actually generating | |
30:50: exponential amounts of garbage in the system and they like you kind of | |
you | |
30:55: sort of start with that and then it'll eventually be collected but it | |
turns out between now and eventually it's going to allocate exponential amounts | |
of stuff | |
31:01: that you don't need so this is in some sense an obvious disaster in | |
retrospect we didn't notice it at first because our | |
31:07: very first applications for this did not use lots of nested binds but | |
when we started building user interfaces with | |
31:14: this it turned out nested binds were really useful and we sort of noticed | |
these weird very long pauses in the middle and didn't know why and it turned | |
31:20: out to be due to this all right any questions lots of lots of exciting | |
31:26: confusing stuff going on so you should feel free to not understand things | |
yes | |
31:36: we'll get there this implementation is quite different from the one in | |
the paper and indeed when I gave the talk at | |
31:41: CMU like the guys who respond for the original paper like looked a little | |
horrified I was like no it's the whole | |
31:47: point of the COG it gets better I promise it really does get better but | |
the results were this was pretty fast | |
31:54: we know that cut off behavior is terrifying and we know things aren't | |
collected eagerly enough but when use | |
31:59: sort of within a scope where we're the kind of available submit semantics | |
was good enough it was a really effective | |
32:06: piece of software and we actually were able to build systems that were | |
very hard to build without it and it was all | |
32:11: pretty efficient we were all very excited about it even though there were | |
all these problems that we knew about so | |
32:18: we knew however there was a problem so our first attempt to fix it we | |
knew about the cut off problem but didn't | |
32:24: like totally understand this other problem but too much allocation and | |
our first attempt to fix this problem was to | |
32:29: go back in some sense and read the papers more carefully and understand | |
what they did about this and it turned | |
32:36: out they had a solution to this and the solution involves time so the | |
basic idea | |
32:43: is that you want to have a topological sort of the graph when you're | |
going | |
32:48: through and doing the firing because then is a simple algorithm for | |
making sure that everything is fired only once that can be done in a way that's | |
32:54: sensitive to cut-offs because as you're going you can do the full | |
computation so you really know the values in your hand and what you did so just | |
be clear a | |
33:01: topological sort is this it's a it's a labeling of the graph with | |
identifiers that are that are sortable | |
33:06: so you get have a total order of everything in the graph such that this | |
ordering respects the dependency order | |
33:12: which means you never go forward in the ordering and end up crossing a | |
back edge you're always kind of walking forward | |
33:18: when you're walking forward in numbers so the way you do propagation once | |
you have a top sort is really easy you have | |
33:24: a heap of things that need to be recomputed but haven't been yet you | |
start with your dirty nodes you throw | |
33:30: them on the heap and then you start taking things out and propagating and | |
throwing all the things to get fired | |
33:35: onto the heap and because you're always pulling it out in the right order | |
you'll never visit a node until you've | |
33:40: recomputed everything then that node depends on right so sort of | |
guarantees to you that you never refire everything | |
33:46: but you can at any point like look at a recomputed node and say yeah this | |
didn't change by enough I want to stop here and | |
33:52: that doesn't disturb any of the invariants that you have so top sorts are | |
good they make this kind of graph | |
33:58: algorithm simpler the only problem is how do you get a top sort so the | |
here's | |
34:05: a really there's a bunch of ways of approaching this but here's a really | |
kind of very naive simple way of thinking about a topological sort which | |
34:11: is one way we can do it is we can solve it using a clock right so every | |
time we allocate a new node well look at the | |
34:18: clock on the wall and say what time is it we'll put that timestamp on the | |
node and now there's there's an ordering and | |
34:24: as long as we're only using the map and map to and map three part of | |
incremental | |
34:29: it turns out everything works because every new node only depends on old | |
nodes so the time order respects the | |
34:36: dependency order so we're happy so where does it fall apart it falls | |
apart when you're graphs are dynamic right if you | |
34:42: just kind of constructing the graph in one early sweep the time stamp is | |
enough and in fact you don't need a clock right | |
34:48: you use a counter that you upgrade every time but so so that all that all | |
works but when you're in the middle | |
34:54: reallocating nodes in the middle of your graph it's not so hot anymore so | |
it's the problems when you're allocating | |
34:59: nodes inside of a bind that's when you're in trouble so the idea is to | |
use a different kind | |
35:06: of time stamp right a kind of logical time that will solve our problems | |
in the idea is that instead of picking you can | |
35:13: think of the old way of using a clock is you the set of all of the time | |
stamps that have been already used and you find a | |
35:20: new time stamp that's bigger than all of those so the alternate rule is | |
to say I'm gonna get a new time stamp in a | |
35:27: different way I'm going to pick a time stamp that's bigger that that's | |
bigger | |
35:32: than everything that is smaller than the bind under which I was allocated | |
right | |
35:38: so if you were a new guy allocated inside some bind right you always want | |
to be behind that bind so you want to be | |
35:45: in a time stamp that's before that bind and after everything else that's | |
before that bind right that's basically the | |
35:52: rule and in some sense if you have that rule if you can if you can | |
allocate time stamps that way everything almost just | |
35:58: works the only issue is how do you allocate time stamps that way so | |
because you know you have to if you have like a | |
36:04: bind node that's here in the order and then there's some other stuff | |
behind it well you can allocate something in the middle here and then maybe | |
you're gonna | |
36:10: do something in the middle here and like these are all gonna get really | |
really close together as you allocate more and more of them so you might think | |
like | |
36:16: what data structure do I use for these maybe I could represent them as | |
real numbers or something but though that's | |
36:22: not so hot right there like you know unbounded in length and the whole | |
thing is kind of a disaster so it turns out | |
36:30: there's a bunch of clever data structures for doing this that have good | |
asymptotic behavior the simplest thing | |
36:35: that we thought about which I think which which works reasonably well is | |
to just use count have your counters kind | |
36:42: of allocated in some big space like it's an integer space and like when | |
you pick ones like pick you know ones that are | |
36:48: kind of well spaced out to begin with and then when you have to go in | |
between while you go in between and when you run out of space you just flatten | |
stuff out | |
36:55: it's almost like a garbage collection step you sort of wait until you | |
have no space and you just like redistribute | |
37:01: yeah you just D compact but it's the opposite of a garbage collector it's | |
a negative garbage collector or something | |
37:06: I don't know you but it has the same flavor of like you're building up a | |
problem and you wait to like it's pretty big and then you like have a batch | |
37:12: operation that clears it out and there are other approaches of doing this | |
to with like clever binary trees and stuff | |
37:18: yeah you can use trees for it at all it all depends on exactly the | |
performance properties you want the nice thing about | |
37:24: the kind of smooth out one is it's really simple to implement and the the | |
identifiers themselves are really | |
37:30: cheap to compare and you do a lot more comparison than you do allocation | |
so there's a big win there for using the | |
37:36: ins but anyway there's lots of different tricks but like that's the basic | |
idea and so we went ahead and implemented | |
37:43: this and it was pretty easy were able to like an intern who didn't have a | |
ton of experience was able to kind of go in and and create a version of this and | |
it | |
37:51: worked better and worse so it was better in that like the semantics made | |
more sense but it was worse than that the | |
37:58: performance was worth it was a lot worse like you know one and a half to | |
four times worse than the original implementation so okay so we weren't | |
38:06: super excited about that and that kind of just stayed is an intern | |
project and we did not roll it into production and | |
38:12: then we got the Version three so Version three is to solve the other | |
problem that we have and this is the | |
38:18: version that we actually used for a long time for our the big GUI apps | |
that use this sorry you had a question Eric even | |
38:28: numbered in different ways this is in this talk I call this one version | |
three and I think this is the right time order of the versions anyway so this | |
version | |
38:37: is the one that's trying to just solve the exponential garbage problem | |
and this is just the problem I talked about | |
38:42: before which is it takes a long time to collect everything and those | |
things that keep on running because they're still | |
38:47: connected to the via to the variables are allocating lots of garbage so | |
bind infested code gets totally screwed by | |
38:54: the by this by our original implementation so this is trying to fix it | |
and the idea the basic idea is | |
39:01: actually relatively simple sorry you have a question so that's true but | |
isn't | |
39:20: really good enough right the point being like it's still exponential | |
right this it can it could | |
39:26: be going so fast that you have trouble keeping up with it so you know you | |
can | |
39:32: argue about lots of things but anytime you have an actual part of your | |
algorithm which is exponential like it's very easy to trip off into it not | |
39:38: working at all so we had a fairly simple approach which is we're going to | |
keep explicit | |
39:45: track of what is the observed part of the graph and we do this by minting | |
39:53: explicit values called observers and so just like we the things that were | |
kept | |
40:00: alive are kind of the transitive closure of the variables the things that | |
are needed is the transitive closure of the of the observers in the opposite | |
40:05: direction and we eagerly keep track of observation changes so when a bide | |
node | |
40:12: causes one notes instead of looking at one incremental looking at another | |
the one it stopped looking about immediately | |
40:18: moves into being unobserved and the new one immediately moves into being | |
observed so when you have a bind fire | |
40:25: and it changes its focus from one area to another you know there's | |
guarantee that only only the sort of current reason real bit | |
40:32: of the computation is actually observed and you have an extra invariant | |
that you make sure that only the observed part of | |
40:37: the computation actually turns you quiesce the unobserved part you tell | |
it not to compute anymore | |
40:44: and that is basically enough to solve the problem because the problem | |
wasn't that we had these things that were around for a little while it's that | |
they | |
40:50: were still alive and kicking while they were around so we kind of you | |
know early kill them a little bit early before the | |
40:56: garbage collector actually gets to them to stop them from causing trouble | |
so | |
41:04: let's actually look for a second at what the interface looks like with | |
observers | |
41:12: so now we have we have three different | |
41:18: components of the interface instead of just two there's the incremental | |
which is kind of like it was before if we've | |
41:23: removed the part that lets you observe values from it there's the | |
variables which are exactly | |
41:29: the same and now there's this new thing called an observer which you can | |
create from an incremental you can extract | |
41:34: values from you can register on update functions for and importantly you | |
can stop observing them right so this is | |
41:41: like the new hook and you see like they're in some sense they're the dual | |
of the variable they're kind of the opposite of the variable they keep track | |
41:47: of opposite parts of the graph and the variable is something that you | |
create from nothing and then convert to an | |
41:54: incremental The Observer is something you can create from an incremental | |
and then peel information out of okay and this this | |
42:03: part of the interface really does is preserve to the modern version of | |
incremental it's essentially a reference | |
42:13: count it does well sort of as well see | |
42:26: when an observer is garbage collected it is automatically unobserved but | |
just | |
42:32: remember how I said before it's not a good idea maybe to let the garbage | |
collector quiesce things that aren't needed anymore so in practice you should | |
42:38: unobserved things when you don't need them anymore which is not say that | |
it | |
42:45: always happens okay so this worked great | |
42:53: we used it a lot we used it in a bunch of our internal UIs and it mostly | |
worked | |
42:58: although we recently replaced him we're very happy with that replacement | |
okay so | |
43:03: now we're on diversion for version 4 was like really going back and | |
reading the Akkar paper carefully motor-car is like | |
43:10: the main guy behind the original papers in this area so this was not not | |
in like a useful implementation exactly but it | |
43:16: was a good learning experience we kind of went back and integrated the | |
the | |
43:23: viewer the the v2 and v3 things into one implementations we learned a few | |
extra | |
43:28: invariants from the papers that we needed about thinking about what | |
depends on what we also learned some interesting | |
43:35: things about the way in which the original implementation worked and one | |
of the interesting things that struck us | |
43:41: is that the original implementation from a car did not distinguish how | |
map nodes | |
43:47: and bind nodes were constructed in other words you can implement map in | |
terms of bind because bind is a more powerful | |
43:53: operator but that turns out to be crushingly expensive because it | |
involved lots of extra allocation because you end | |
43:59: up kind of allocating new nodes all the time when the graph is actually | |
static and so you don't really need to do allocation so a lot there are lots of | |
I | |
44:07: think practical performance problems in those original implementations | |
that turned out not to be necessary and that we didn't have in | |
44:13: ours because we specialized the map and map and map and implementation to | |
be more efficient so v5 was like a more | |
44:23: practical attempt to sort of combine v2 and v3 it also added one extra | |
bit of | |
44:29: functionality which was interesting which was remember how I said I | |
described this nice thing with time | |
44:35: stamps how we're going to implement insert new time stamps before the | |
bond and after everything else and that's | |
44:41: going to make things work out okay so it turns out if you're like a good | |
functional programmer and you only use | |
44:46: good functional programming primitives everything works great but if you | |
aren't | |
44:52: and you really shouldn't be right it doesn't work out so well in | |
particular a thing that you very naturally want to do | |
44:58: in this kind of context is memorize computations if there's some sub | |
graph | |
45:04: that you computed and then some other part of your computation wants to | |
do it you really want to share the same one right it's it's kind of a form of | |
common | |
45:11: sub-expression elimination applied to your computation meaning there's | |
some common sub piece and you want to | |
45:16: actually share the computation from that sub piece rather than repeating | |
it so the way you implement that is you'd only | |
45:21: have like some table somewhere which keeps track of previous | |
constructions of | |
45:27: graphs so you can remember them and get them back when you want them | |
again but this can cause you to pull things out of | |
45:33: order right not in the order that was originally intended by this kind of | |
timestamp system and so the end result | |
45:40: is you can even when you're doing what is the sensible incremental code | |
you can construct back edges and you asked about | |
45:46: cycles before what happens when you introduce a cycle with a back edge | |
well for the implementations up till here | |
45:53: what happened is you just well in the original implementation you had an | |
infinite loop so like the way your your | |
45:59: program told you that you had introduced a back edge is you said | |
stabilize it it never ended you know I mean fail stop | |
46:05: right it's at least at least it doesn't like send orders or something | |
terrible but that's not obviously not ideal in | |
46:12: the simple like a car style time stamp thing it's just gonna say kaboom | |
like | |
46:17: you've put in an ill-founded edge I'm going to throw an exception you're | |
dead and it turned out in a real | |
46:22: there were lots of these backwards edges so that was not so hot so we | |
added to this version a sometimes used dynamic | |
46:29: top sort algorithm weird like normally allocate these counters in the | |
ordinary way but if it saw a back edge rather | |
46:36: than blowing up I would tell you well I don't know how to do like a | |
really good incremental top sort algorithm but I can | |
46:42: just like redo the top sort in this case so it's a little hacky you sort | |
of might worry that maybe they're a bad corner | |
46:47: cases here but it kind of worked so this was like semantically much | |
better but | |
46:54: but the performance was bad and why was the performance bad well remember | |
I said | |
47:00: there's a heap and you put the things in the heap and you take them out | |
well heaps are kind of expensive data structures write the constant isn't all | |
47:06: that low and there you know log n to remove an element so it costs a fair | |
amount to flow things through a heap and | |
47:13: even after we you know we no longer had like the thing an intern did in a | |
few months we had like you know a brand new | |
47:20: implementation that a bunch of good people had worked on for a long time | |
to try and make work well but it still didn't work fast enough to make us | |
47:27: comfortable about replacing him all right so v6 was to eliminate the | |
47:36: heap from the system so the core idea is to say is to move a little bit | |
away from | |
47:42: the notion of using a total order and the key observation is we actually | |
never needed the total order you know all you | |
47:50: need a top sort totally orders all the nodes but it's enough to have a | |
partial order right which what is a partial or | |
47:56: just means not every there are lots of things that are different that our | |
mark is equal in a partial order or not not comparable to each other you know | |
that's | |
48:02: what some things are really different and some things you can't tell and | |
it turns out that's okay as long as the | |
48:09: partial order reflects the dependency structure well and that in fact you | |
48:15: pretty easy to think of partial orders that do that so one example is the | |
height of a node in the graph so how do | |
48:22: we define the height you can simply define it recursively as if you're a | |
leaf it's 0 and otherwise it is the | |
48:28: maximal height of all of your children plus 1 alright look at all your | |
kids see what the max height is you're one taller | |
48:34: than that and this also has the nice property that it respects the | |
dependency order and by | |
48:41: virtue of being a partial order we get to have something with a very | |
small number of valence is a small number of | |
48:46: distinct values right so you could imagine you could have a graph with | |
hundreds of thousands of nodes but only | |
48:53: 70 different heights right so now instead of having to have a heap that | |
might have a hundred thousand nodes in | |
48:59: it you just have an array of lists right you can have a very simple | |
constant time | |
49:06: low overhead data structure for keeping track of the things that need to | |
be fired because the valence is is like | |
49:12: small enough you can stick it in your pocket right you just have like a | |
little array that just takes care of it and this turned out to be a good a good | |
49:20: idea and this idea was what we called a pseudo height so I said the | |
height of a node is 1 plus the max of the height of | |
49:26: the children a pseudo height is just like that except it's not quite so | |
painful to compute so why is the height | |
49:33: bad to compute the heights bad to compute because imagine you have like a | |
big graph and at the bottom of the graph | |
49:38: there's a node with a bind where it sometimes is high too and sometimes | |
is high one ticket that flips back and | |
49:46: forth you have to renumber everything back and forth so like you're | |
suddenly taking a thing that should be constant | |
49:52: and making it time linear in the overall size of the graph so that's kind | |
of a disaster so the pseudo height is almost like a | |
49:58: height except its memory sensitive which is to say it never goes down so | |
your | |
50:04: height your pseudo height is the maximum your pseudo height has ever been | |
basically so it starts out being the | |
50:10: height and it's never willing to come down and that in practice has | |
worked really well it also has some weird | |
50:16: corners you can write some terrible data structures that kind of terrible | |
incremental companies but basically like | |
50:22: have two nodes that kind of change their mind as to who is ahead of whom | |
and they keep on walking each other off the height graph but you kind of never | |
see | |
50:31: it so this is not a thing we warn people about especially and you can | |
construct it if you want like you could if you go | |
50:37: back into your desk but I think about how to do this but it doesn't seem | |
to come up in real programs and we'd know | |
50:42: because like we get we get errors when we exceed the pre-committed height | |
so this turns out to work really well in | |
50:49: practice and and and this and this is again a property that our current | |
system has so what our is also the performance | |
50:58: was now quite good like if you think about the overall performance story | |
so | |
51:04: here's sort of the picture so imagine like up is bad at taking longer so | |
we started off with something pretty good | |
51:09: and then we got something with better semantics but you know worse | |
behavior and you know a little better a little | |
51:14: worse kind of hovering around here for several versions and then finally | |
with this version we could kind of you know if this was our baseline we'd | |
finally | |
51:21: kind of brought it down to like yeah maybe just a little bit worse than | |
v1 right so now we had like the good | |
51:27: behavior and the performance was about as good as before so we were | |
relatively happy with this so we had another nice | |
51:36: trick which we wanted which was so this is in v7 and we had kept this | |
this | |
51:43: structure having finalized errs on all of the versions but it turns out | |
we didn't really need finalizer everywhere | |
51:50: the finalizar were there I remember for the Sentinels and the sentence | |
over there for the ref counts of figuring out | |
51:55: when you disconnected things so they could be collected because they were | |
no longer seeable anywhere so it became | |
52:03: real eyes that we could basically just use the observers for this with a | |
little | |
52:08: bit of extra hackery in particular we still need to make sure things are | |
collectible when they're totally unreachable but we can do it in a | |
52:15: somewhat subtler and cheaper way which is we just use the observability | |
data to | |
52:22: enforce the invariant that there are no pointers from the observed to the | |
unobserved world so if you see like a | |
52:28: connection from the observes the unobserved world which you can check | |
every time your observability changes you say well I'm not allowed to have | |
52:33: that pointer so I cut it off and if it changes in such a way that you | |
need to reestablish the pointer well then you | |
52:39: can reestablish it and it helps that you kind of are on the you have to | |
be on the right side to reestablish it but that's | |
52:45: that is in fact the case because the guy who becomes observed kind of | |
knows who he depends on and he tells like either | |
52:51: re-establish the upward pointer and this now means that the unobserved | |
world is | |
52:57: kind of free-floating it's no longer connected to the variables so if no | |
one is can is no one is using it then it can actually be collected so | |
53:04: we basically got to get rid of you you know you used to be if you had | |
three hundred thousand nodes in your graph you had three hundred thousand | |
finalized | |
53:10: errs and no more right now you only need you might have you know abate | |
three | |
53:15: hundred thousand node graph where you have ten output values and then you | |
only need ten finalized errs and finalizes | |
53:21: cost something from the from the collector so there's real value there | |
53:26: again and also in this version we discovered some new semantic | |
constraints that we hadn't realized having to do | |
53:32: again with bind nodes by nodes are really the complicated bit and the | |
thing we realized in this version is that when | |
53:37: a by node fires and you you create nodes and then it fires again you | |
create new nodes those old nodes are kind of bad it | |
53:44: turns out it's not safe to depend on them so you have to make sure to | |
kind of obsolete to kill off nodes and not allow | |
53:52: dependencies on those so that was a new thing we discovered in this | |
version but I get but the real big win for this | |
53:57: version was the finalize errs and it was great now it was finally like | |
we'd gotten down to here so that was pretty | |
54:04: good we were excited and we thought we were done and then the guys who | |
were | |
54:11: working on this von Anton Gotti and Barone and Nick Chapman who were the | |
main guys who were doing that all these | |
54:16: kind of different versions of incremental after the first few were done | |
by these guys and so they kind of handed it back to Stephen weeks for code | |
54:21: review and if you've ever met Stephen calling it code review it's like | |
it's more like code rewriting but he could he | |
54:29: wants to make it look like he wants to and in between so V 7 it was sort | |
of the the kind of final version of this | |
54:35: process but then it turns out we ended up with a v8 so why did we end up | |
with a v8 well between the time that we were | |
54:41: working on like all these versions and the time we did the final review | |
we had learned new things about how allocation | |
54:48: how much allocation costs for these kinds of programs and we've gotten | |
some new features in the language in | |
54:54: particular we got what are called generalized algebraic data types and we | |
and this gives you nice ways of encoding | |
55:01: what one calls existential x' so let me talk a little bit about this if | |
it sounds like we're in philosophical it | |
55:07: almost is but the the idea is basically this that | |
55:13: the the way the way incremental is initially implemented is kind of | |
tricky | |
55:18: in that you have all these different nodes remember this like a parameter | |
for these nodes right something might be | |
55:25: computing an int and something else might be computing a list of strings | |
may be different kinds of things in different parts but they all have to sit | |
55:31: in one data structure and as you know in oh camel a you know a no camel | |
list is parametrized in its generosity right | |
55:37: it's an alpha list it might have integers or strings but can't have like | |
a mixture of things and we kind of need | |
55:43: to have a mixture of things we need to be able to generally plug these | |
guys in so the way you do that is you create | |
55:48: what you might call a poor-man's object you have a little record and each | |
record has a bunch of values that are functions | |
55:55: that let you do things to the underlying object but don't expose the type | |
of that | |
56:00: object and instead of directly storing the underlying object it's this | |
record full of closures that you actually keep | |
56:07: in your data structure so this turns out to totally suck and it sucks for | |
two distinct reasons one reason it sucks is | |
56:14: it's slow like closures are relatively if they're relatively cheap but if | |
you allocate a bunch of that we can cost a lot in terms of memory but it is also | |
56:21: bad as a programming experience it's like programming with oven mitts | |
right you have to like mint one unique oven | |
56:28: mitt for every operation that you want to do and like you kind of go in | |
every time and do the exact thing you can do but like you can't see what's | |
happening | |
56:33: on underneath the usual programming Sal we're used to a no camel where | |
you have like you know these variants and records | |
56:40: and nested structures that tell you exactly what's going on so you can | |
really operate directly on it just aren't there when you work in this in | |
56:46: this kind of object-oriented almost style and the thing that lets you do | |
what you want is a slightly richer thing | |
56:54: in the type system so existential basically let you talk about a thing | |
56:59: whose type you don't know more naturally in terms of the operations that | |
you have | |
57:04: on top of it saying like yeah I don't know what the type is but there | |
exists a type that it is and I can operate on it | |
57:10: so that helps and and generalize algebra gates give you a little bit more | |
expressiveness so you can sort of do a little more and just kind of writing | |
57:16: these things down as ordinary very as ordinary you know listed different | |
possibilities so it was a win in two it | |
57:24: was a performance win because we got rid of all these closures that we | |
don't really need it I know it was also a comprehensibility win the code was | |
57:30: much easier to understand and because of that it was a performance win | |
again because it turned out there were optimizations that you could do because | |
57:37: you could see what you was going on underneath you knew the exact | |
structure and names of the kind of nodes you had | |
57:43: so there are cases where you could do small optimizations and rewiring | |
zuv the graph because you knew what was going on | |
57:49: and so it had a bunch of different wins so what was the result of this | |
was like | |
57:55: the performance did this right like it was suddenly massively better it | |
was | |
58:01: shocking it was so much faster that end-user applications like serious | |
58:06: relatively well tuned to end-user applications were three times faster | |
after this changed than before right I | |
58:14: don't mean the library was three times faster I don't know what the | |
multiplier on the library itself is but the actual | |
58:20: applications got three times faster and the thing that this teaches you | |
is that these applications that you made faster | |
58:26: by adding incremental are in some sense now totally bounded by | |
incremental right | |
58:32: there's almost nothing left at so so if you think about it you have a | |
computation that's too slow to do all at | |
58:37: once so you make it much faster by incremental izing it but now almost | |
all the actual computing you're doing is the | |
58:44: increment is the incrementalization framework itself rather than the real | |
computation so the returns to improving | |
58:50: the incrementalization framework are really high and that's kind of what | |
we saw when we actually rolled this out it | |
58:55: was a really massive difference and in fact the thing that happened | |
recently is with one of the kind of trading system | |
59:01: front ends that used incremental was was very slow and it was using v3 | |
from this | |
59:07: talk and headsman for a long time we were new cheating about getting it | |
fixed and yeah we had a lot of things to do is busy and finally were like okay | |
okay | |
59:13: we'll do it and we moved it from v3 to v8 to the current production | |
version and it was astonishing like it was so much | |
59:20: faster like people said this went from like the thing that ruins my life | |
to the fastest application I interact with so | |
59:26: that was an interesting and kind of gratifying thing to see happen all | |
right | |
59:31: so that was a big deal so let's talk about like what there is left so now | |
we have this this library we | |
59:39: use it a lot use it for both things that are kind of compute engines and | |
also for user interfaces I mean starting to think | |
59:46: about using it even more for user interfaces so a few different things | |
that we've thought about is potential | |
59:51: future improvements and some that are in flight so one observation that | |
we've made recently in thinking about using | |
59:58: incremental as part of doing JavaScript applications of all things | |
because it | |
1:00:03: turns out you can compile a camel to JavaScript and run it in a web | |
browser is that you could use incremental to | |
1:00:11: incremental eyes the computation of what you need to show to the user | |
and this is | |
1:00:17: a thing that we've thought about doing kind of in a way that's kind of | |
consistent with this kind of virtual Dom approach that's become increasingly | |
1:00:24: popular with libraries like react which is just a kind of straight-up | |
JavaScript library that uses this approach and the | |
1:00:30: nice thing about virtual Dom is it's a way of thinking about your | |
computation | |
1:00:36: of what people see as just a simple function from data to the thing | |
that's viewed and it turns out optimizing | |
1:00:43: simple functions is exactly what incremental is good for and so we've | |
thought about using it in this context | |
1:00:48: and one of the things that we've discovered was kind of missing an | |
incremental the way we started with with it as incremental is really good if you | |
1:00:54: have like if you have your inputs in like boom they're smashed into a | |
million pebbles and now you're kind of each one | |
1:01:00: being an incremental and now you're kind of building up from those | |
pebbles the computation it works really well but if | |
1:01:05: you start with just like a big data structure like a big functional | |
data structure that kind of you know a new | |
1:01:11: version lands and a new version lands and a new version lands it's not | |
clear how to bootstrap that into incremental | |
1:01:16: in the clean way and one of the things we're playing around with is | |
using adding some extra primitives to | |
1:01:22: incremental that'll allow us to kind of take an ordinary functional | |
data structure that is changing as an | |
1:01:27: incremental value and kind of efficiently extract information about | |
individual sub components by taking | |
1:01:34: advantage of the fact that a lot of these functional data structures | |
can be dipped efficiently so this uses like a | |
1:01:40: big an important part of this is the function there's the map data | |
structure which has this function symmetric diff | |
1:01:45: that lets you really efficiently rediscover what changes have been made | |
between two Maps kind of taking | |
1:01:51: advantage of being able to cut off that computation on physical | |
equality of the new map and | |
1:01:56: the old map so that turns out to be a kind of productive and | |
interesting | |
1:02:01: little corner of incremental and maybe gives us the ability to kind of | |
take these the ideas and they take these | |
1:02:09: techniques of building internal computations and apply them somewhat | |
more generally so that's one that's one | |
1:02:16: idea and then another idea which is somewhat older and in fact comes | |
from other research that has happened on on | |
1:02:23: previous systems that are kind of similar like there's a system called | |
Father Time which is a part of the kind | |
1:02:29: of guys who work on scheme in racket and in that system they had an | |
optimization which is potentially attractive called | |
1:02:36: lowering I think is the term they gave for it which it's almost a kind | |
of inlining optimization it's basically so | |
1:02:41: in in father time they instead of saying oh you can you can choose when | |
to use this incremental abstraction they just | |
1:02:48: use it everywhere like kind of it was baked into the fabric fabric | |
every time there's like an open paren you know in | |
1:02:53: some scheme program like there was another incremental there and that's | |
sort of a disaster because you have so | |
1:03:00: many of them that you're really swamped by the overhead of the | |
incremental computation and so what they did was | |
1:03:07: they basically used ways of choosing to kind of merge together nodes | |
trying to optimize by saying yeah it's not really | |
1:03:13: worth it to have these all broken down well it will have bigger nodes | |
that do bigger pieces and less less | |
1:03:20: incrementalization but overall better efficiency and that's the thing | |
you could imagine us doing as well I think the promising direction is to do a | |
kind | |
1:03:27: of trace based optimization where you keep some statistics about how | |
long things take and then you can use those | |
1:03:34: for making decisions about how to collapse nodes so it's it's this is a | |
much less important optimization for us | |
1:03:40: because we already get to make explicit choices about where we do and | |
don't want to construct these nodes and so | |
1:03:46: programmers have a lot of direct control over the performance but it | |
would still be nice it would allow you to kind of throw in a lot of | |
incrementality | |
1:03:53: without thinking too hard about it and then have you know some | |
optimizer could come back and claw back decent | |
1:03:59: performance for you anyway that basically covers the whole thing | |
1:04:06: I guess the the for me I think the the interesting there's a few | |
interesting things flown from this one is it's just | |
1:04:13: interesting how far you might need to go from the thing you read in | |
some paper to something you're really happy to use in | |
1:04:18: production and it also I think how does it there's lots of interesting | |
ideas | |
1:04:24: that you have to dig through to get to that really good implementation | |
like it was one of these cases where we didn't | |
1:04:30: say all make it better we really sat down and did a lot of | |
experimentation and I think the end result of that experiment is we really | |
understood the | |
1:04:36: problem well and we're able to get to an implementation that really | |
delivered useful things for the actual practical | |
1:04:42: work that we do on a day to day basis and that I think is that did | |
anyone any | |
1:04:49: final questions alright thank you |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment