Skip to content

Instantly share code, notes, and snippets.

@nthery
Last active February 24, 2022 14:25
Show Gist options
  • Save nthery/2a95f2347a3ed2c1ad219f878baf8814 to your computer and use it in GitHub Desktop.
Save nthery/2a95f2347a3ed2c1ad219f878baf8814 to your computer and use it in GitHub Desktop.
// Break down linear IR into basic blocks and construct CFG.
// Just a naive program to teach myself datalog and Souffle.
.type Address <: number
// A sequential instruction, i.e. one that does not change control flow.
// Don't bother describing operands as we're only interested in control flow here.
// (ALU, load, store)
.decl seq(address:Address)
// An unconditional jump.
.decl jmp(address:Address, dest:Address)
// A conditional jump.
.decl cjmp(address:Address, ifTrue:Address, ifFalse:Address)
// A small program in this IR.
jmp(0, 1).
seq(1).
seq(2).
cjmp(3,1,7).
seq(4).
jmp(5, 6).
jmp(6, 1).
seq(7).
// A basic block terminator.
.decl terminator(address:Address)
terminator(a) :- jmp(a, _).
terminator(a) :- cjmp(a, _, _).
// An instruction is either a terminator or not.
.decl insn(address:Address)
insn(a) :- terminator(a); seq(a).
// A basic block.
.decl basic_block(first:Address, last:Address)
basic_block(f,l) :- max_sequence_without_terminator(f,l-1), terminator(l).
basic_block(f,l) :- max_sequence_without_terminator(f,l), !insn(l+1).
basic_block(f,l) :- f = l, (terminator(f-1); !insn(f-1)), terminator(l).
// A non-maximal sequence of instructions without branches.
.decl sequence_without_terminator(first:Address, last:Address)
sequence_without_terminator(f,l) :- f = l, seq(f).
sequence_without_terminator(f,l) :- seq(f), sequence_without_terminator(f+1,l).
// A maximal sequence of instructions without branches.
.decl max_sequence_without_terminator(first:Address, last:Address)
max_sequence_without_terminator(f,l) :- !seq(f-1), sequence_without_terminator(f, l).
// Control flow graph.
// Successor of basic block [first, last].
.decl succ(first:Address, last:Address, succ:Address)
succ(f, l, s) :- basic_block(f, l), branch_to(l, s).
// Extract destination address of branch instruction.
.decl branch_to(a:Address, to:Address)
branch_to(a, to) :- jmp(a, to); cjmp(a, to, _); cjmp(a, _, to).
.output succ
// Output is:
// 0 0 1
// 1 3 1
// 1 3 7
// 4 5 6
// 6 6 1
@julienhenry
Copy link

Very good!
I agree that your solution seems correct. I also back up Quentin's comments.

Just a few comments regarding performance. Some of the rules may be costly if we were doing this analysis on a large program.

The relation sequence_without_terminator is recursive and the following rule is recursive:

sequence_without_terminator(f,l) :- seq(f), sequence_without_terminator(f+1,l).

Let's have a look at the RAM generated by Soufflé for this rule (option --show=transformed-ram when invoking Soufflé) :

    QUERY
     IF ((NOT ISEMPTY(seq)) AND (NOT ISEMPTY(@delta_sequence_without_terminator)))
      FOR t0 IN seq
       FOR t1 IN @delta_sequence_without_terminator ON INDEX t1.0 = (t0.0+NUMBER(1))
        IF (NOT (t0.0,t1.1) IN sequence_without_terminator)
         INSERT (t0.0, t1.1) INTO @new_sequence_without_terminator
    END QUERY

This query is invoked within a fixpoint loop. Here, at each loop iteration:

  • we iterate over all tuples in seq (may be quite large for a program with thousands of instructions)
    • in a nested loop, iterate over all tuples in @delta_sequence_without_terminator that satisfy the property t1.0 = t0.0 + 1.

I you read about what we call "semi-naive evaluation", @delta_sequence_without_terminator is supposed to contain ontly the tuples that have been inserted in sequence_without_terminator at the previous fixpoint-loop iteration. This means @delta_* will always remain quite small.

So, intuitively, it would probably be more efficient to swap the above query and do:

sequence_without_terminator(f,l) :- sequence_without_terminator(f+1,l), seq(f).

Let's have a look at the resulting RAM:

    QUERY
     IF ((NOT ISEMPTY(@delta_sequence_without_terminator)) AND (NOT ISEMPTY(seq)))
      FOR t0 IN @delta_sequence_without_terminator
       FOR t1 IN seq
        IF ((t0.0 = (t1.0+NUMBER(1))) AND (NOT (t1.0,t0.1) IN sequence_without_terminator))
         INSERT (t1.0, t0.1) INTO @new_sequence_without_terminator
    END QUERY

Here, we iterate on the @delta_sequence_without_terminator tuples first, which is good. However, we introduced another problem: the nested iteration on seq iterates over all tuples in seq instead of just those satisfying the property t0.0 = t1.0+1 (we see there is no ON INDEX join as we had before). Here we seem to hit a limitation of Soufflé, which was unable to figure out that it could only iterate on a subrange of seq...

A way to fix this, would be to create another relation:

.decl next(a:Address, n:Address)
next(a-1, a) :- insn(a), a > 0.

And use it to avoid having to write "f+1" in our previous rule:

sequence_without_terminator(f,l) :- sequence_without_terminator(f_plus_one,l), next(f, f_plus_one), seq(f).

The resulting query is :

    QUERY
     IF (((NOT ISEMPTY(seq)) AND (NOT ISEMPTY(@delta_sequence_without_terminator))) AND (NOT ISEMPTY(next)))
      FOR t0 IN @delta_sequence_without_terminator
       FOR t1 IN next ON INDEX t1.1 = t0.0
        IF ((t1.0) IN seq AND (NOT (t1.0,t0.1) IN sequence_without_terminator))
         INSERT (t1.0, t0.1) INTO @new_sequence_without_terminator
    END QUERY

Here, as you can see, we iterate only on the new tuples t0 in @delta_sequence_without_terminator,
then only on the tuples t1 in next such that t1.1 = t0.0 (there will be only one such tuple t1 per tuple t0), and then we just have a test (t1.0) in seq that avoids iterating on the seq relation at all.

Finally, with this next relation, we can improve some of your other rules that would also benefit from a better indexing:

example:
basic_block(f,l) :- max_sequence_without_terminator(f,l-1), terminator(l).
would become:
basic_block(f,l) :- max_sequence_without_terminator(f,l_minus_one), next(l_minus_one, l), terminator(l).

etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment