Last active
February 24, 2022 14:25
-
-
Save nthery/2a95f2347a3ed2c1ad219f878baf8814 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Break down linear IR into basic blocks and construct CFG. | |
// Just a naive program to teach myself datalog and Souffle. | |
.type Address <: number | |
// A sequential instruction, i.e. one that does not change control flow. | |
// Don't bother describing operands as we're only interested in control flow here. | |
// (ALU, load, store) | |
.decl seq(address:Address) | |
// An unconditional jump. | |
.decl jmp(address:Address, dest:Address) | |
// A conditional jump. | |
.decl cjmp(address:Address, ifTrue:Address, ifFalse:Address) | |
// A small program in this IR. | |
jmp(0, 1). | |
seq(1). | |
seq(2). | |
cjmp(3,1,7). | |
seq(4). | |
jmp(5, 6). | |
jmp(6, 1). | |
seq(7). | |
// A basic block terminator. | |
.decl terminator(address:Address) | |
terminator(a) :- jmp(a, _). | |
terminator(a) :- cjmp(a, _, _). | |
// An instruction is either a terminator or not. | |
.decl insn(address:Address) | |
insn(a) :- terminator(a); seq(a). | |
// A basic block. | |
.decl basic_block(first:Address, last:Address) | |
basic_block(f,l) :- max_sequence_without_terminator(f,l-1), terminator(l). | |
basic_block(f,l) :- max_sequence_without_terminator(f,l), !insn(l+1). | |
basic_block(f,l) :- f = l, (terminator(f-1); !insn(f-1)), terminator(l). | |
// A non-maximal sequence of instructions without branches. | |
.decl sequence_without_terminator(first:Address, last:Address) | |
sequence_without_terminator(f,l) :- f = l, seq(f). | |
sequence_without_terminator(f,l) :- seq(f), sequence_without_terminator(f+1,l). | |
// A maximal sequence of instructions without branches. | |
.decl max_sequence_without_terminator(first:Address, last:Address) | |
max_sequence_without_terminator(f,l) :- !seq(f-1), sequence_without_terminator(f, l). | |
// Control flow graph. | |
// Successor of basic block [first, last]. | |
.decl succ(first:Address, last:Address, succ:Address) | |
succ(f, l, s) :- basic_block(f, l), branch_to(l, s). | |
// Extract destination address of branch instruction. | |
.decl branch_to(a:Address, to:Address) | |
branch_to(a, to) :- jmp(a, to); cjmp(a, to, _); cjmp(a, _, to). | |
.output succ | |
// Output is: | |
// 0 0 1 | |
// 1 3 1 | |
// 1 3 7 | |
// 4 5 6 | |
// 6 6 1 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Very good!
I agree that your solution seems correct. I also back up Quentin's comments.
Just a few comments regarding performance. Some of the rules may be costly if we were doing this analysis on a large program.
The relation sequence_without_terminator is recursive and the following rule is recursive:
Let's have a look at the RAM generated by Soufflé for this rule (option --show=transformed-ram when invoking Soufflé) :
This query is invoked within a fixpoint loop. Here, at each loop iteration:
I you read about what we call "semi-naive evaluation", @delta_sequence_without_terminator is supposed to contain ontly the tuples that have been inserted in sequence_without_terminator at the previous fixpoint-loop iteration. This means @delta_* will always remain quite small.
So, intuitively, it would probably be more efficient to swap the above query and do:
Let's have a look at the resulting RAM:
Here, we iterate on the @delta_sequence_without_terminator tuples first, which is good. However, we introduced another problem: the nested iteration on seq iterates over all tuples in seq instead of just those satisfying the property t0.0 = t1.0+1 (we see there is no ON INDEX join as we had before). Here we seem to hit a limitation of Soufflé, which was unable to figure out that it could only iterate on a subrange of seq...
A way to fix this, would be to create another relation:
And use it to avoid having to write "f+1" in our previous rule:
The resulting query is :
Here, as you can see, we iterate only on the new tuples t0 in @delta_sequence_without_terminator,
then only on the tuples t1 in next such that t1.1 = t0.0 (there will be only one such tuple t1 per tuple t0), and then we just have a test
(t1.0) in seq
that avoids iterating on the seq relation at all.Finally, with this
next
relation, we can improve some of your other rules that would also benefit from a better indexing:example:
basic_block(f,l) :- max_sequence_without_terminator(f,l-1), terminator(l).
would become:
basic_block(f,l) :- max_sequence_without_terminator(f,l_minus_one), next(l_minus_one, l), terminator(l).
etc.