Skip to content

Instantly share code, notes, and snippets.

@jadeallenx
Last active November 25, 2015 19:14
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jadeallenx/902ab46bedce96db9ba0 to your computer and use it in GitHub Desktop.
Save jadeallenx/902ab46bedce96db9ba0 to your computer and use it in GitHub Desktop.
Yeah, I think I See What You Mean

tl;dr You should watch this.

It's interesting to me that a significant number of talks in this year's Strange Loop conference referenced a 1966 paper by Peter Landin called (somewhat tongue in cheek) "The Next 700 Programming Languages". In this paper, Landin describes a programming language which he says in a footnote might be aptly called "Church without the lambda" - but which in the paper is called "If You See What I Mean" (ISWIM).

Alvaro's talk delved deeply into the question of "What is the semantics or meaning of a program?" Alvaro's thesis is that the meaning of a program is to produce desired outcomes from a computing system but that current languages focus on provoking certain computer behaviors. In Alvaro's view, this is backwards. This topic is covered in some detail in the Landin paper and it arises again and again.

In section 9 of the paper, Landin writes:

An important distinction is the one between indicating what behavior, step-by-step, you want the machine to perform, and merely indicating what outcome you want.

In a few more paragraphs Landin continues with another thesis of Alvaro's:

It is a trivial and pointless task to rearrange some piece of symbolism into prefixed operators and heavy bracketing. It is an intellectually demanding activity to characterize some physical or logical system as a set of entities and functional relations among them. However, it may be less demanding and more revealing than characterizing the system by a conventional program, and it may serve the same purpose.

The idea being that the communication pattern used for human to computer communication may vary in form, with some forms being more suitable than others. In the middle part of Peter's talk he establishes the unmistakable link between the program itself, the behaviors it causes and the outcomes that come almost as a side-effect of the behaviors.

Alvaro says this is messed up. We ought to be thinking about the outcomes and focus less on the program per se or the behaviors which induce the outcomes. He uses SQL as an example of how this might practically work. When you execute a query using SQL, you don't write an explicit imperative procedure to answer the query. Rather you say, "this is the data I am interested in, and this is where it lives. You figure out how to make these propositions true (if it can be true given the facts contained in the system at this point in time.)"

Alvaro wondered, "Hey, is it possible to apply the semantics of a query language to building robust distributed systems?" but quite possibly in a more thorough and thoughtfully put way. So Alvaro (and other colleagues at UC Berkeley) built some experiments to play with this idea, and it turns out that it just might be possible to use a form a logical proposition to drive desired outcomes in distributed systems.
He runs through several examples of this in his talk, some of them derived from the Datalog language and some examples with semantic modifications to Datalog such as the "Dedalus" language Peter created.

One of the most interesting parts of the talk was the discussion of how state might be represented in a query language. He pointed out there's a pretty strong notion of "time" in the system where state is partitioned into "past events" and "future events". He says, as far as messages go, you can't easily tell if a message has been received by a node, but you can track if a message was sent.

Another implication was that "knowledge is ephemeral." It only has a semantic meaning in a specific time and a specific local node. He said that distributed computation is really the rendezvous of ephemera, where deduction can take place between facts which the system knows about at a particular time.

Alvaro started with the statement that "Abstractions are dangerous." He concluded:

  • Respect your users by providing appropriate abstractions (they're powerful tools, yo) because you are your users, too!
  • Abstractions leak. Deal with it.
  • Meaning well is better than "feeling good" In other words, programs aren't about how programming makes a programmer feel per se. They're about getting stuff done.
  • Inventing languages is dope.

Some of the open questions I have - in Landin's paper, he writes:

...reduc[e] the extent to which the program conveys its information by explicit sequencing...

And by the use of a declarative query language, it seems that Peter is suggesting this is a desirable goal. He talks about how an optimizer decomposes a query into operations and sequences the operations correctly in a RDBMS. Presumably there would be an equvalent entity somewhere in the system. It isn't clear to me how or if that can work well in complex distribution models. (Maybe that is future work?)

I am especially curious because given that knowledge only has meaning on a specific node and specific time, it seems like an optmizer in a declarative distributed system programming language would have to either: a) search a large space for lots of answers or b) have a pretty good idea of what nodes "know" what things before it tried to decompose the query into steps of operations. Also it seems like the sequence of things like map and reduce across sets of knowedge have pretty specific orderings if one desires the asked for outcomes - i.e., those operations are not communitative. Is that an "implementation detail?" Because it seems like a pretty hard problem to solve, especially if you're seeking confluent operations which permit weak orderings between messages.

Earlier last week, Peter tweeted that John Backus' Turing award speech Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs was one of his favorite papers and I'm just now grappling with why that is. I liked it when I first read it because it was pretty clear to me that Backus felt like much (all?) the work he had put into FORTRAN was in some ways a failure. I now perceive that Peter likes it because it challenges readers to try and concieve of a model of computation which is dramatically different from what has come before - a model like what Peter sketches in this Strange Loop talk. Quoting the Backus paper:

[T]here is a desperate need for a powerful methodology to help us think about programs, and no conventional language even begins to meet that need. In fact, conventional languages create unnecessary confusion in the way we think about programs.

As you read through Backus, it's hard not to see that paper as a direct decendant of the Landin paper above. Indeed, a (different) Landin paper is included in the citations of the Backus paper. What makes that quote especially poignant to me is that it was written in 1977; almost 40 years ago. Given that, are we finally on the cusp of a big idea which will let us answer Backus's question affirmatively?

@palvaro
Copy link

palvaro commented Oct 4, 2015

Thanks so much for this great post!

He talks about how an optimizer decomposes a query into operations and sequences the operations correctly in a RDBMS. Presumably there would be an equivalent entity somewhere in the system.

it seems like an optmizer in a declarative distributed system programming language would have to either: a) search a large space for lots of answers or b) have a pretty good idea of what nodes "know" what things before it tried to decompose the query into steps of operations.

Yeah! As a first cut, there can be a local query optimizer at each node that takes care of the classic query optimizations (access method selection, predicate inference and predicate and projection pushdown, join ordering, etc) as computation is driven by messages arriving (i.e., at least naively, we choose access methods and join orders for all joined tables except one, which is decided by the environment in the form of message delivery ordering). Local query optimization is a fairly mature research area. Global, distributed optimizations are a different matter altogether, as you point out. We can imagine playing all sorts of neat tricks with dynamically co-optimizing the placement of data (classic partitioning stuff) and code (function shipping) -- that is, should I ship the data to the code, the code to the data, or ship them both to some third place to rendezvous? Programming distributed systems in a language like Dedalus opens up this space of optimizations (I have the output spec, so the space of alternative implementations we have to search is well-defined) but doesn’t solve the problem of deciding how to rank and search them. These are really exciting (as well as daunting) avenues of future work, as you point out.

Also it seems like the sequence of things like map and reduce across sets of knowledge have pretty specific orderings if one desires the asked for outcomes - i.e., those operations are not communitative.

It seems to me that all map operations and some reduce operations are monotonic, hence confluent (COME AT ME). If we wrote them in a language like dedalus or bloom, we could automatically detect if this was the case. If it was, we could (in principle) get rid of the barriers and pipeline everything (which makes fault-tolerance more complicated, of course, but this is a separate discussion). In cases where reduce isn’t monotonic, we could (in principle) use an analysis like that of Blazes to determine when different partitions (e.g. word keys in the perennial wordcount example) could progress independently while maintaining confluence, using fine-grained producer-consumer punctuations rather than global barriers.

Given that, are we finally on the cusp of a big idea which will let us answer Backus's question affirmatively?

Isn't it exciting to imagine?

@jadeallenx
Copy link
Author

OK, so the $64,000 question becomes: is Dedalus ready for quote real unquote industrial applications, big hairy nasty distributed systems? I read stuff like Neil Conway's paper about Edelweiss and it knocked my socks off - yeah this is research I know you are very very familiar with.

But in the back of my mind was kind of, "Well, how 'real' is this language?" Could you, say, build a product around it and not have it be an issue constantly at sales meetings with prospects?

@palvaro
Copy link

palvaro commented Oct 4, 2015

If the question is rhetorical, please forgive me for answering it. :) When we designed Dedalus we obviously weren't optimizing for sales meetings, but could Dedalus and the ideas behind it lead to commercial success? Maybe? I like to imagine so; I certainly believe in the vision. It would certainly take work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment