-
-
Save nthery/2a95f2347a3ed2c1ad219f878baf8814 to your computer and use it in GitHub Desktop.
// Break down linear IR into basic blocks and construct CFG. | |
// Just a naive program to teach myself datalog and Souffle. | |
.type Address <: number | |
// A sequential instruction, i.e. one that does not change control flow. | |
// Don't bother describing operands as we're only interested in control flow here. | |
// (ALU, load, store) | |
.decl seq(address:Address) | |
// An unconditional jump. | |
.decl jmp(address:Address, dest:Address) | |
// A conditional jump. | |
.decl cjmp(address:Address, ifTrue:Address, ifFalse:Address) | |
// A small program in this IR. | |
jmp(0, 1). | |
seq(1). | |
seq(2). | |
cjmp(3,1,7). | |
seq(4). | |
jmp(5, 6). | |
jmp(6, 1). | |
seq(7). | |
// A basic block terminator. | |
.decl terminator(address:Address) | |
terminator(a) :- jmp(a, _). | |
terminator(a) :- cjmp(a, _, _). | |
// An instruction is either a terminator or not. | |
.decl insn(address:Address) | |
insn(a) :- terminator(a); seq(a). | |
// A basic block. | |
.decl basic_block(first:Address, last:Address) | |
basic_block(f,l) :- max_sequence_without_terminator(f,l-1), terminator(l). | |
basic_block(f,l) :- max_sequence_without_terminator(f,l), !insn(l+1). | |
basic_block(f,l) :- f = l, (terminator(f-1); !insn(f-1)), terminator(l). | |
// A non-maximal sequence of instructions without branches. | |
.decl sequence_without_terminator(first:Address, last:Address) | |
sequence_without_terminator(f,l) :- f = l, seq(f). | |
sequence_without_terminator(f,l) :- seq(f), sequence_without_terminator(f+1,l). | |
// A maximal sequence of instructions without branches. | |
.decl max_sequence_without_terminator(first:Address, last:Address) | |
max_sequence_without_terminator(f,l) :- !seq(f-1), sequence_without_terminator(f, l). | |
// Control flow graph. | |
// Successor of basic block [first, last]. | |
.decl succ(first:Address, last:Address, succ:Address) | |
succ(f, l, s) :- basic_block(f, l), branch_to(l, s). | |
// Extract destination address of branch instruction. | |
.decl branch_to(a:Address, to:Address) | |
branch_to(a, to) :- jmp(a, to); cjmp(a, to, _); cjmp(a, _, to). | |
.output succ | |
// Output is: | |
// 0 0 1 | |
// 1 3 1 | |
// 1 3 7 | |
// 4 5 6 | |
// 6 6 1 | |
Very good!
I agree that your solution seems correct. I also back up Quentin's comments.
Just a few comments regarding performance. Some of the rules may be costly if we were doing this analysis on a large program.
The relation sequence_without_terminator is recursive and the following rule is recursive:
sequence_without_terminator(f,l) :- seq(f), sequence_without_terminator(f+1,l).
Let's have a look at the RAM generated by Soufflé for this rule (option --show=transformed-ram when invoking Soufflé) :
QUERY
IF ((NOT ISEMPTY(seq)) AND (NOT ISEMPTY(@delta_sequence_without_terminator)))
FOR t0 IN seq
FOR t1 IN @delta_sequence_without_terminator ON INDEX t1.0 = (t0.0+NUMBER(1))
IF (NOT (t0.0,t1.1) IN sequence_without_terminator)
INSERT (t0.0, t1.1) INTO @new_sequence_without_terminator
END QUERY
This query is invoked within a fixpoint loop. Here, at each loop iteration:
- we iterate over all tuples in seq (may be quite large for a program with thousands of instructions)
- in a nested loop, iterate over all tuples in @delta_sequence_without_terminator that satisfy the property t1.0 = t0.0 + 1.
I you read about what we call "semi-naive evaluation", @delta_sequence_without_terminator is supposed to contain ontly the tuples that have been inserted in sequence_without_terminator at the previous fixpoint-loop iteration. This means @delta_* will always remain quite small.
So, intuitively, it would probably be more efficient to swap the above query and do:
sequence_without_terminator(f,l) :- sequence_without_terminator(f+1,l), seq(f).
Let's have a look at the resulting RAM:
QUERY
IF ((NOT ISEMPTY(@delta_sequence_without_terminator)) AND (NOT ISEMPTY(seq)))
FOR t0 IN @delta_sequence_without_terminator
FOR t1 IN seq
IF ((t0.0 = (t1.0+NUMBER(1))) AND (NOT (t1.0,t0.1) IN sequence_without_terminator))
INSERT (t1.0, t0.1) INTO @new_sequence_without_terminator
END QUERY
Here, we iterate on the @delta_sequence_without_terminator tuples first, which is good. However, we introduced another problem: the nested iteration on seq iterates over all tuples in seq instead of just those satisfying the property t0.0 = t1.0+1 (we see there is no ON INDEX join as we had before). Here we seem to hit a limitation of Soufflé, which was unable to figure out that it could only iterate on a subrange of seq...
A way to fix this, would be to create another relation:
.decl next(a:Address, n:Address)
next(a-1, a) :- insn(a), a > 0.
And use it to avoid having to write "f+1" in our previous rule:
sequence_without_terminator(f,l) :- sequence_without_terminator(f_plus_one,l), next(f, f_plus_one), seq(f).
The resulting query is :
QUERY
IF (((NOT ISEMPTY(seq)) AND (NOT ISEMPTY(@delta_sequence_without_terminator))) AND (NOT ISEMPTY(next)))
FOR t0 IN @delta_sequence_without_terminator
FOR t1 IN next ON INDEX t1.1 = t0.0
IF ((t1.0) IN seq AND (NOT (t1.0,t0.1) IN sequence_without_terminator))
INSERT (t1.0, t0.1) INTO @new_sequence_without_terminator
END QUERY
Here, as you can see, we iterate only on the new tuples t0 in @delta_sequence_without_terminator,
then only on the tuples t1 in next such that t1.1 = t0.0 (there will be only one such tuple t1 per tuple t0), and then we just have a test (t1.0) in seq
that avoids iterating on the seq relation at all.
Finally, with this next
relation, we can improve some of your other rules that would also benefit from a better indexing:
example:
basic_block(f,l) :- max_sequence_without_terminator(f,l-1), terminator(l).
would become:
basic_block(f,l) :- max_sequence_without_terminator(f,l_minus_one), next(l_minus_one, l), terminator(l).
etc.
Great!
There are many solutions, and this one looks perfectly valid.
basic_block(f,l) :- f = l, [...]
sincef = l
you are allowed to use a single variablebasic_block(a,a) :- [...]
.Similarly
sequence_without_terminator(a,a) :- seq(a).
My intuition is that adding a special exit terminator at the end of the program would simplify the rules.
If I had to do the same exercise, I would maybe go this way (quite similar):
exit(a)
terminator instruction that would always be the last instruction.terminator(a) :- exit(a).
non_maximal_bb(a,a) :- terminator(a).
that boostraps on each terminator.non_maximal_bb(f-1,l) :- !terminator(f-1), non_maximal_bb(f,l).
that grows backward.basic_block(f,l) :- non_maximal_bb(0,l).
basic_block(f,l) :- terminator(f-1), non_maximal_bb(f,l).