Skip to content

Instantly share code, notes, and snippets.

@HACKE-RC
Created November 4, 2023 16:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save HACKE-RC/4517148a64b81c516f9b748a23e41623 to your computer and use it in GitHub Desktop.
Save HACKE-RC/4517148a64b81c516f9b748a23e41623 to your computer and use it in GitHub Desktop.

Satisfiability Modulo Theories (SMT) solvers are one of the most interesting topics to learn about in Computer Science. When you first discover them, they feel surprising and magical. You might already have heard of or seen someone use a SMT solver like Z3 for solving CTF challenges or Program Analysis and by the end of this video, you'll have a good grasp of all the required knowledge to get started and use Z3.

I will not use any complicated mathematics to explain these solvers and will deal with only required theory and examples.

SAT Solvers

SMT solvers leverage the powers of another type of solvers called Boolean Satisfiability Problem solvers. As the name suggests, they have something to do with Boolean variables.

These solvers basically take a Boolean expressions and output whether there are possible values for the Boolean variables which result in the expression being true (or satisfiable). When there are no such variables SAT solver outputs unsatisifable.

If the expression is satisfiable then the SAT solver can also output the values for the variables which satisfy the expression.

Relation with SMT solvers

Satisfiability Modulo Theory (SMT) solvers essentially combine the powers of SAT solvers and some other solvers. SMT solvers like SAT are able to find not only the satisfiability but also the satisfying inputs for Boolean expressions but they are not limited to just Boolean expressions. SMT solvers are capable of other inputs such as integer, arrays and arithmetic operations.

Terminologies

There are few terms that you'll need to know when navigating through the smt solver territory. Concrete values: Concrete values are simply constant values. For example 5 is a concrete value. It's that simple. Symbolic values: Symbolic values are like the unknown variables which you have when dealing with an algebraic expression. These are have are used to represent values which are not yet known. Example: 3x + 2y = 1, in this expression x and y are symbolic values.

Symbolic Execution

Symbolic Execution is a technique which essentially reduces the conditions inside given program into mathematical equations and tries to solve them using SMT and SAT solvers.

Instead of just theoretical explanation, let us look at an example in order to understand the the essence of Symbolic Execution. Consider the following C program:

int main(){
	int j = getUserInput();
	int k = j * 2;

	if (k > 0){
		if (k * 2 == 8){
			printf("Correct input!\n");
		}
		else{
			exit(-1);
		}
	}
	else{
		exit(-1)	
	}
}

When compiled and executed normally, this program would take an concrete integer input (lets say 7) from the user and evaluate and run into the path which is satisfiable which would result in the program calling the exit function in this case. However, when run with symbolic execution:

  • j will be assigned a symbolic value (e.g. γ)
  • k will be assigned γ * 2
  • At the if statement, the symbolic execution engine will remember the condition as a constraint (i.e. γ * 2 > 0) and execute the if branch and at the same time remember the else branch and add it's constraint (γ * 2 < 0) and execute that too symbolically with a copy of the program's current state just like what the if branch has except with a different constraint.
    • The path that the if branch takes has another if condition whose constraint (γ * 2 * 2 == 8) is again remembered along with the existing constraint (γ * 2 > 0) and also symbolically executes the else branch at the same time with the opposite constraints.
      • The if branch then proceeds and executes the code which essentially leads the program to exit normally after printing "Correct Input!", the symbolic execution engine then solves the remembered constraints [γ * 2 > 0, y * 2 * == 8] which results in a concrete value and remembers the paths it leads to.
    • The path that the else branch takes simply exits so the concrete value to this path is solved and remembered.
  • The path that the else branch takes leads to an exit after which the symbolic execution engine solves the constraints and get's the concrete value which will satisfy the constraints (i.e. γ * 2 < 0) and remembers the path it leads to. After the execution is complete, the symbolic execution engine will output all the possible inputs or the requested inputs and tell where they will lead.

Let us first label all the code paths so it will be easier for us to understand:

int main(){
	int j = getUserInput();
	int k = j * 2;

	if (k > 0){
	path_if_1:
		if (k * 2 == 8){
		path_if_2:
			printf("Correct input!\n");
		}
		else{
		path_else_2:
			exit(-1);
		}
	}
	else{
	path_else_1:
		exit(-1)	
	}
}

Here's what we assume the Symbolic Execution engine will tell us: ![![/#^Table1]]And now, we know of all the possible inputs and the path they will lead to and can input a specific value and get to the desired path.

I'm sure by now you'll be thinking that it took a single integer input and what about more complex programs which take string inputs, etc., well that is also possible but it would've been quite hard to express in a video so I'm not covering those here but those values can be found as you'll see later in the post.

99% fail problem

All of us have seen those social media posts where it's a math puzzle type question which states that 99% people fail at solving them, I'm not sure about the source of this statistic but what I'm sure about is that you'll be capable of solving those problems in seconds after learning about z3, which is what we'll do in this part of the video and learn how this relates with symbolic execution later. ![[Question 1.png]] This is one such graphic with a question (I redesigned the original problem so it looks nice), if we use symbols, we can represent the problem like this:

square * square + circle = 16
triangle * triangle * triangle = 27
triangle * square = 6

square * circle * triangle = ?

Upon reading the problem question, we know the following things for sure:

  • There are 3 unknown variables square, triangle and circle.
  • There are total 3 known concrete values of the expressions of these 3 unknown variables.
  • All three unknown variables hold integer values.

These three known concrete values of the expressions of these unknown values are essentially the constraints required to reach the required values for square, circle and triangle. If you do not understand this right now, you'll get it soon.

Code example with z3

To get started with z3, install it with the following command:

pip install z3_solver

Now, import everything from z3 to get started:

from z3 import *

Let me bring the problem question here so you don't have to scroll.

square * square + circle = 16
triangle * triangle * triangle = 27
triangle * square = 6

square * circle * triangle = ?

From our previous analysis, we know that all three unknown variables hold integer values, so we'll define all three of these as Ints:

from z3 import *

square = Int("square")
circle = Int("circle")
triangle = Int("triangle")

# Alternatively you can define all of them in one line
# square, circle, triangle = Ints("square, circle, triangle")

Now, we'll have to create a solver object to which we will add all of our constraints:

from z3 import *

square = Int("square")
circle = Int("circle")
triangle = Int("triangle")

solver = Solver()

Let us now define our first constraint, which is square * square + circle = 16:

from z3 import *

...

solver = Solver()
solver.add(square * square + circle == 16) # z3 requires us to use '=' for showing equality.

Simple, right? Now add the rest of the constraints:

from z3 import *

...

solver = Solver()
solver.add(square * square + circle == 16)
solver.add(triangle * triangle * triangle == 27)
solver.add(triangle * square == 6)

Now after defining all the constraints, the next for us is to check whether these set of equations (or constraints) are satisfiable or not, which can be done by calling the check method on the solver object after defining the constraints:

from z3 import *

...

solver.add(square * square + circle == 16)
solver.add(triangle * triangle * triangle == 27)
solver.add(triangle * square == 6)

# sat stands for sastisfiable, meaning that the set of constraints are satisfiable
if solver.check() == sat:
	# do stuff	

After calling the check method, we call the model method to retrieve a satisfying model which we can later use to get the values of the unknown variables:

from z3 import *

...

solver.add(square * square + circle == 16)
solver.add(triangle * triangle * triangle == 27)
solver.add(triangle * square == 6)

# sat stands for sastisfiable, meaning that the set of constraints are satisfiable
if solver.check() == sat:
	m = solver.model()

If you want to keep things easier, you can simply print m and it'll return the values for square, circle and triangle.

from z3 import *

...

# sat stands for sastisfiable, meaning that the set of constraints are satisfiable
if solver.check() == sat:
	m = solver.model()
	print(m)

This will output the values which satisfy our constraint, which are:

[circle = 12, triangle = 3, square = 2]

Now you could manually do solve question with just these values or write code which can do it itself:

manually:

square * circle * triangle = ?
2 * 12 * 3 = 72

The other way is this:

from z3 import *

...

# sat stands for sastisfiable, meaning that the set of constraints are satisfiable
if solver.check() == sat:
	m = solver.model()

	# eval method returns the numbers with the type z3.z3.IntNumRef
	# as_long method is used to convert that type to int
	square_value = m.eval(square).as_long()
	circle_value = m.eval(circle).as_long()
	triangle_value = m.eval(triangle).as_long()

	result = square_value * circle_value * triangle_value
	print("The answer is: ", result)

That's it, it wasn't the smallest of the explanation but was meant for people with any level of experience with z3 to understand it. The full code can be found here.

Now, look at this piece of code:

#include <stdio.h>  
  
void win(){  
	printf("You win!\n");
	exit(0)
}  
  
void lose(){  
	printf("You lose!\n");
	exit(-1)
}  
  
int check(int square, int circle, int triangle){  
	if (square * square + circle == 16){  
		if (triangle * triangle * triangle == 27){  
			if (triangle * square == 6){  
				win();  
			}  
		}  
	}  
	lose();
}  
  
int main(void){  
	int square;  
	int circle;  
	int triangle;  
	  
	printf("Enter the value of square: ");  
	scanf("%d", &square);  
	  
	printf("Enter the value of circle: ");  
	scanf("%d", &circle);  
	  
	printf("Enter the value of triangle: ");  
	scanf("%d", &triangle);  
	  
	check(square, circle, triangle);  
	return 0;  
}

Looks familiar? Well, this is the problem but framed as a C program where the objective is to get the program to call the win function. Obviously, we can get the valid inputs for this program from the same script as before. And this is how you'll write scripts - by first reading the decompiled or source code of the program and then figuring out all the constraints (or conditions) that are needed to be satisfied in order to reach a specific path.

Now that we've gone through this one, you can surely try another simple problem that I found today on Twitter(Elon's x): https://twitter.com/gunsnrosesgirl3/status/1687158247881392137 Solution here

Example 2

Let's try another example from a recent challenge from amateurs ctf, it's name was "volcano".

  • Given file: volcano
  • Description: Inspired by recent “traumatic” events. Here's the decompilation of the main function: ![[Pasted image 20230805190602.png]] So, the program first asks for a integer input (llu stands for long long unsigned) and then the sub_0_012BB function to check for something. Let's rename and look into the function to see what it's doing: ![[Pasted image 20230805191106.png]] Looks like it's just checking for some conditions... or constraints? These constraints can be easily defined through z3, so let's do that, here's what it'll result in:
import z3  
  
# 64 bit bitvector (includes printable/non-printable, all characaters)
inp1 = z3.BitVec('inp1', 64)

s = z3.Solver()  
# conditions based on checks  
s.add((inp1 & 1) == 0)  
s.add(inp1 % 3 == 2)  
s.add(inp1 % 5 == 1)  
s.add(inp1 % 7 == 3)  
s.add(inp1 % 0x6D == 55)

Now, let's see what the rest of the code does and keep updating our script: ![[Pasted image 20230805191300.png]] It's clear that the program takes another such integer input (lets call it inp2) and then calls a function, let's also look up this function: ![[Pasted image 20230805191358.png]] Nothing quite complex, it seems to be looping a1 times and then doing some some boolean operations - these can be easily reimplemented in another Python or C. Let's do it in python and add it to our script:

import z3
...

def check_volcano(a1):  
	v2 = 0  
	while a1:  
		v2 += a1 & 1 
		# >>== is same as: var = var >> 1 
		a1 = a1 >> 1  

	# just rewrote it more cleanly
	return 0x10 < v2 <= 0x1A

Perfect! Let's look further to see what the program does: ![[Pasted image 20230805201853.png]] Another input is taken (call it inp3) and then it is checked whether the and of this input and 1 is not equal to zero and the number itself is not 1, if that is true the input is then put into a cluster of functions whose output determines whether the input is correct or not. Alright, let's have a look into them one by one: ![[Pasted image 20230805191841.png]] This is another simple function, the way it's called, it simply checks if the two numbers are the same. So, now we know that both of our inputs (inp1 and inp2) have to be the same. ![[Pasted image 20230805192027.png]]

After this, another function is called with both inp1 and inp2 and then checked in same way: ![[Pasted image 20230805192057.png]] Again, nothing crazy, easily rewriteable: ![[Pasted image 20230805192140.png]] To my understanding, it is also just checking the equality of those two numbers. Let's move on. This last function is now called with three inputs, the variable which holds the constant 0x1337 (v10), inp2 and v9, which is the third input and it's compared with the call to the same function with just inp1 in place of inp2: ![[Pasted image 20230805192412.png]] This one is also an easy one: ![[Pasted image 20230805192500.png]] Since, both inp1 and inp2 are the same and the only constraint to inp3 we can avoid this function too as it too seems to be doing an check related to the equality of the inputs. Here's the final script:

import z3  
  
# 64 bit input  
inp1 = z3.BitVec('inp1', 64)  
  
  
# function translated from decompiled c code  
def volcano_check(a1):  
	v2 = 0  
	while a1:  
		v2 += a1 & 1  
		a1 = a1 >> 1  
		  
	return 0x10 < v2 <= 0x1A  
  
  
s = z3.Solver()  
# conditions based on checks  
s.add((inp1 & 1) == 0)  
s.add(inp1 % 3 == 2)  
s.add(inp1 % 5 == 1)  
s.add(inp1 % 7 == 3)  
s.add(inp1 % 0x6D == 55)  

#while there are valid solutions
while s.check() == z3.sat:  
	inp1_solution = int(s.model()[inp1].as_long())  
	if volcano_check(inp1_solution):  
		# both inputs need to be equal
		print("input 1 & 2: ", inp1_solution)  
		
		# i & 1 != 0	
		print("input 3: ", 3)  
		break  
  
	s.add(inp1 != s.model()[inp1])

Running this script, we get the value for input 1 and 2: 389970145857386 and input 3 is any number whose & with 1 is not zero e.g. 3. ![[Pasted image 20230805210220.png]] And here we go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment