Skip to content

Instantly share code, notes, and snippets.

@si14
Last active October 23, 2015 14:55
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save si14/381615215a6c273d95e4 to your computer and use it in GitHub Desktop.
Save si14/381615215a6c273d95e4 to your computer and use it in GitHub Desktop.
  1. for "missing?" we often need to query database to check if an item exists. If predicates are truly independent, we will need to do the query again to return the item.
  2. "data flow" looks something like this (in Prismatic's Graph lib):
(def stats-graph
  {:n  (fnk [xs]   (count xs))
   :m  (fnk [xs n] (/ (sum identity xs) n))
   :m2 (fnk [xs n] (/ (sum #(* % %) xs) n))
   :v  (fnk [m m2] (- m2 (* m m)))})

or as a picture:

graph

This is basically a data structure that more-or-less explicitly encodes data dependencies between functions and allows to check stuff like "all data needed for a function is already computed when it's called".

On the other hand, "decision graph" (it can be encoded as FSM, as in the case of your HTTP decision diagram) is all about callbacks and outcomes. In all practical implementations that I've saw, callbacks were allowed to pass something "downstream" (in state or query) to solve the problem mentioned in (1). However, this introduces implicit data dependencies between callbacks that are inferior to explicit "data flow" model.

Therefore, it seems preferable to somehow "merge" both approaches, but I don't know yet how. It would be interesting to hear your thoughts on this.

@andreineculau
Copy link

ok, thanks. now i'm with you 100%, i think :)

rather than passing stuff downstream, my thoughts were at some early stage to re-think callbacks with "downstream" dependencies. in your example, missing? would be dependent on some get callback, which can be modelled abstractly, for instance with Graph.

reversely, if you want to deal with the optimization first, rather than modelling the dependencies abstractly, one could use memoization and indirection so that missing? would call get, return true/false and not put anything in the opaque bucket

need to digest

@si14
Copy link
Author

si14 commented Jan 5, 2015

Let me double-check that I understand you correctly: you mean that beside that decision tree (DAG, actually) encoded as FSM with callbacks there would be an abstract (and spec'ed) dependency structure between callbacks? Seems very reasonable, but I'm not sure if this is flexible enough.

Here is an example: let's say we have get-etag callback for forming ETag of an entity. In some cases, ETag is a function of an entity itself, so get-etag may depend on get (e.g., hash an entity). However, some databases (Datomic) provide better ways to "tag" particular version of an entity (in case of Datomic, transaction id). To leverage this, one will need to return a map from get with an actual entity and it's ETag value and peel it in get-etag, therefore introducing "opaque bucket" that you've mentioned.

@andreineculau
Copy link

Let me double-check

I don't know if that's so much of an opaque bucket anymore. There must be a few callbacks that are inherently without a spec-ed output (but still a spec-ed input I think), and thus flexible. They return a blob that is indeed opaque to the "http decision tree" and to the "callback dependency tree" (including here the specification of the callbacks' args), but it is not opaque with relation to the callbacks themselves which are inherently "subjective" i.e. they run specific code, and thus at least some input source must be subjective.

The "callback dependency tree" (which after looking a bit at Graph, is nothing more/less than a graph) is not something that should be set in stone, abstractly, but something that may have a basic template for the specific HTTP backend to develop on (i.e. some callbacks not only return "opaque"/not-standardized output, but may have other dependencies than originally predicted as well or instead of modifying the standardized output)

Examples based on yours:
A. the get callback in the basic case would return some entity properties (which in most minds means non-meta data: a car's color). In the Datomic case, it would return some entity properties (the car's color) and meta-data: the etag. Truth be told, that is still entity data.

B. In the Datomic case, the get callback could follow the "standard" basic type of output, with no non-meta data, and modify instead the dependencies: introduce a new callback get-datomic, which both get and get-etag depend on.

If I look at it from an ideal and modular PoV, B wins. From a practical/gain PoV, A wins (mostly because "etag" is still entity data, and thus it's a valid return of the get callback)

@andreineculau
Copy link

ps: i starred your gist, so i should get notifications now. thanks for pinging me on twitter, don't know if git email notification is just slow or doesn't work on gists (starred or not)

@si14
Copy link
Author

si14 commented Jan 17, 2015

Here is a braindump that came out of all this simmering in my head for a while.

I believe that the whole system can have an interface similar to Graph with a few exceptions.

First, we need "decision graph" (like the one that you've made for HTTP). While Graph deduces execution flow from functions' input/output specs, in the case of a protocol implementation it's fixed (except for possible optimizations). However, we can still define default implementations of callbacks in a way similar to Graph's — like a map from keys (:is-method-head-get?) to functions with semantics of "callback takes a map of values computed to the point and returns a value for it's key".

Second, "compilation model" is somewhat different: we walk through all possible trajectories through the decision graph. At each decision-related callback we look for "values needed" set, search for appropriate callback in a Graph-like map, plug it in before the "decision" callback if it exists and wasn't executed before in the trajectory or throw an error if it doesn't or if there is a circular dependency. All this defines some mix of lazy (for values needed by "decision" callbacks) and eager (for decision points) evaluation. We are also validating that every value needed "downstream" (like etag in the Datomic example) is produced by some callback "upstream".

As for specs, "values needed" sets for "decision" and "outcome" callbacks can (and probably should) be spec'd in addition to "decision graph". Spec should probably be concerned with required subsets of needed values (so user can depend on more values, but not less). This way, "200 OK" callback may depend on "entity" and "content-type" values:

{:outcome-200 [:entity :accepted-content-type]
 :accept-matches? [:accepted-content-type :available-content-types]}

To implement this stuff in a language of choice one will need to choose data structure for value name → callback map, implement a "compiler" and a way to read specs. To implement a particular protocol (such as HTTP), one implements all "decision" and "outcome" callbacks, as well as "custom" callbacks like entity.

By the way, all this looks like your "B" example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment