Skip to content

Instantly share code, notes, and snippets.

@paniq
Last active July 12, 2024 12:50
Show Gist options
  • Save paniq/484dfd1a25efd4d192e7a21de56550d3 to your computer and use it in GitHub Desktop.
Save paniq/484dfd1a25efd4d192e7a21de56550d3 to your computer and use it in GitHub Desktop.
Low-Level Datalog

Low-Level Datalog

With the right low level composition primitives, it would be possible to even describe accelerating data structures in a relational way.

Let's begin by treating the heap as a special relation.

# arcmem heap as polymorphic lattice
# .decl heap(address : T*, value : T)
.tuple T(kind : type)
.tuple Q(T, value : i32)

# store
heap(malloc(Q), obj) :-
    init(x),
    obj = Q(typeof(Q), x).

# load
Q_zero(ptr) :-
    Q_maybe_zero(ptr),
    heap(ptr, obj),
    obj.value = 0.

# load-on-store
Q_on_change(ptr_Q) :-
    # subscribes to any pointer writes of type T
    heap(bitcast(ptr,T), header),
    # sum type dispatch
    header.kind == typeof(Q),
    # union cast
    ptr_Q = bitcast(ptr,Q),
    # also subscribes to any pointer writes of type Q
    heap(ptr_Q, value).

in arcmem we always know how many elements a pointer can access, and that would allow us to treat pointers as ranges; i.e. heap(ptr, value) matches more than one value when ptr is an array pointer. in soufflé we would need a range like heap(range(ptr, ptr+1), value) to restrict the search to the first (or no) element, which could be sugared as heap(first(ptr), value), or simply heap(ptr, value) if we know that the pointer is always singular.

Memory lifetime is a bit tricky. Under semi-naive evaluation rules, datalog implements the observer pattern ubiquitously: every relation in datalog is a subject (event source), and every rule listens to one or multiple subjects (and in turn also fires events when applied).

As long as a heap object not referenced by any other object is subject to an event, the event keeps the object alive; if the event has been processed and the object still has a refcount of 1, it will be freed.

In our "Low-Level Datalog" we would also initially be unable to reach fixpoints, since no data structures exist to prevent us from reprocessing rules. This means that we need a precursor to relations in the classical sense, so we can implement the conditional (and sometimes recursive) logic that data structure implementations require.

Let's propose continuations as a precursor to relations: they can be called by events, and in turn conditionally call events i.e. invoke other continuations. Our continuations have no way to memoize their arguments, and so they are analog to relations that only forward delta state and can only be used for secondary search in limited capacity. Formally, these relations carry an unreadable monadic counter that prevents deduplication of rows, and when indexed, are implicitly matched to the latest monadic counter, therefore only providing the latest row, if any. We understand these relations as "monadic delta relations".

To demonstrate how events move through delta relations, here are some examples:

# these relations are singularities
cc g(x) :- <external>.
cc t(x) :- <external>.
# f and h bang when g bangs and the respective conditions are met
cc f(x) :- g(x), x >= 0.
cc h(x) :- g(x), x < 10.
# g(y) is redundant because it dominates f(x); bangs when f(x) bangs
cc k1(x) :- f(x), g(y).
# bangs only when f(x) and h(x) bang for the same g(x), implies x = y;
    equivalent to k2(x) :- g(x), x >= 0, x < 10.
cc k2(x) :- f(x), h(y).
# bangs *once* when g(x) bangs and x >= 0 OR x < 10, merging paths.
cc k3(x) :- f(x).
cc k3(x) :- h(x).
# inert: never bangs because we can't prove a common ancestor
cc m1(x) :- g(x), t(y).
# bangs for all elements of g(x) and t(x), but does not merge.
cc m2(x) :- g(x).
cc m2(x) :- t(x).

What is interesting about delta relations is that when we translate them to continuations, the predicates are moved to the caller, rather than being checked in the callee; this allows us to alias the entry point of many rules, and permits deducing single bangs for unioned rules, and hence to model merging control flow.

A term like f(x), g(y) for two call-continuations means that we define a dominating relationship between the two paths; f and g must have a common dominator, which includes f dominating g or vice versa.

Since we know now how continuations relate to datalog, we can derive how continuations as first class arguments are applied:

# modeling a simple function with call-continuations as first class values
cc cont(y) :- pow2(x, cont), y = x * x.
# using pow2 to populate a continuation
cc pow2(x, g) :- f(x).

# same flow, but with pow2 inlined
cc g(y) :- f(x), y = x * x.

Flow rules can emulate currying:

# each fact is a call
cc g(1, 2).
cc g(3, 4).
# calls both f due to the hidden implicit monad
cc f(1) :- g(a, b).
cc f(2) :- g(a, b).
# will not be forwarded to q because it is not dominated by g.
cc f(3).
# equivalent to:
    q(1, 2, 1)
    q(1, 2, 2)
    q(3, 4, 1)
    q(3, 4, 2)
cc q(a, b, c) :- g(a, b), f(c).

!f(_, ...) could also become a well-defined operation, where, analog to stratification, we require that all positive paths of f have been proven as discontinued; effectively we select the else or default path through negation.

To expand domination rules, f(x), !g(_) could require the same domination rules, but applies only when f has been called, and g hasn't, which excludes the possibility that g dominates f, but still supports f and g sharing a caller, or f dominating g.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment