Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save shunghsiyu/a63e08e6231553d1abdece4aef29f70e to your computer and use it in GitHub Desktop.
Save shunghsiyu/a63e08e6231553d1abdece4aef29f70e to your computer and use it in GitHub Desktop.
Fix description of tnum.mask, and use C-style comment
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "3d3c57b0",
"metadata": {},
"source": [
"# Model Checking (a very small part) of BPF Verifer\n",
"\n",
"by [Shung-Hsi Yu](mailto:shung-hsi.yu@suse.com) ([@shunghsiyu](https://twitter.com/shunghsiyu)) as part of [SUSE Hack Week 21](https://hackweek.opensuse.org/21/projects/model-checking-the-bpf-verifier)"
]
},
{
"cell_type": "markdown",
"id": "8629a81b",
"metadata": {},
"source": [
"## Motivation\n",
"\n",
"BPF is a Linux kernel feature that allows users to supply code that will run in kernel context, which is very useful for customizing networking, tracing, security feature that the Linux kernel provides.\n",
"\n",
"Since the user-supplied code run in the kernel context, it is very important that the code does not crash the kernel or access data it is not suppose to read. This is enforced by the BPF verifier that checks user supplied code, and bugs in the verifier has lead to privilege escalation vulnerabilities in the past (e.g. [CVE-2021-3490](https://www.graplsecurity.com/post/kernel-pwning-with-ebpf-a-love-story)).\n",
"\n",
"Thus it is critical to ensure that the BPF verifier is bug-free (as much as possible)."
]
},
{
"cell_type": "markdown",
"id": "71543bbf",
"metadata": {},
"source": [
"## Value Tracking\n",
"\n",
"One part of BPF verifier that's heavily used is for value tracking, below is an example of why value tracking is need.\n",
"\n",
"```c\n",
"int arr[2];\n",
"\n",
"// Is access() ever called with parameter `i` greater than 2?\n",
"int access(int i) {\n",
" return arr[i]\n",
"}\n",
"```\n",
"\n",
"In the example above, it's that value of parameter `i` that we want to track; and to track it's value, the BPF verifier uses something called a tristate number."
]
},
{
"cell_type": "markdown",
"id": "711b0009",
"metadata": {},
"source": [
"## Tristate Numbers\n",
"\n",
"It's easier to see tristate number (aka [tnum in the Linux kernel](https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/tnum.c)) in action than to explain it with words, so below are a few example of tristate number in action:\n",
"\n",
"For example, if we're sure that variable `i` can only be `1`, than here how we represent it; not much different from the binary representation of `1`.\n",
"\n",
"$$\\require{color}$$\n",
"\n",
"$$\\{ 1 \\} = \\{ {01}_2 \\} \\rightarrow \\text{01}_t$$\n",
"\n",
"Now here where thing gets more interesting.\n",
"\n",
"If, however, we know that variable `i` can either be `1` or `3`, we represent the **higher bit with a question mark**, meaning that it is unknown. This is because the **higer bit of `1` and `3` can differ**. On the other hand, the lower bit of both `1` and `3` is set; so the lower bit is known.\n",
"\n",
"$$\\{ 1, 3 \\} = \\{ {\\textcolor{red}{0}1}_2, {\\textcolor{red}{1}1}_2 \\} \\rightarrow {\\textcolor{red}{?}1}_t$$\n",
" \n",
"Similarly, or situation where variable `i` can be either `0` or `2`, the higher bit is known and the lower bit is unknown, again. The only difference this time is that we know the lower bit will be unset.\n",
"\n",
"$$\\{ 0, 2 \\} = \\{ {\\textcolor{red}{0}0}_2, {\\textcolor{red}{1}0}_2 \\} \\rightarrow {\\textcolor{red}{?}0}_t$$\n",
"\n",
"And here's a few more examples. (Limited to numbers that can be represented in 2 bits)\n",
"\n",
"$$\\begin{align}\n",
"\\{ 0 \\} &= \\{ {00}_2 \\} &\\rightarrow \\text{00}_t \\\\\n",
"\\{ 0, 1 \\} &= \\{ {0\\textcolor{red}{0}}_2, {0\\textcolor{red}{1}}_2 \\} &\\rightarrow {0\\textcolor{red}{?}}_t \\\\\n",
"\\{ 1, 2 \\} &= \\{ {\\textcolor{red}{01}}_2, {\\textcolor{red}{10}}_2 \\} &\\rightarrow {\\textcolor{red}{??}}_t \\\\\n",
"\\{ 2, 3 \\} &= \\{ {1\\textcolor{red}{0}}_2, {1\\textcolor{red}{1}}_2 \\} &\\rightarrow {1\\textcolor{red}{?}}_t \\\\\n",
"\\{ 1, 2, 3 \\} &= \\{ {\\textcolor{red}{01}}_2, {\\textcolor{red}{10}}_2, {\\textcolor{red}{11}}_2 \\} &\\rightarrow {\\textcolor{red}{??}}_t\\\\\n",
"\\{ 0, 1, 2, 3 \\} &= \\{ {\\textcolor{red}{00}}_2, {\\textcolor{red}{01}}_2, {\\textcolor{red}{10}}_2, {\\textcolor{red}{11}}_2 \\} &\\rightarrow {\\textcolor{red}{??}}_t\\\\\n",
"\\end{align}$$\n",
"\n",
"You can see some combination of number are just harder to track. For example, trying to track $\\{ 1, 2 \\}$ precisely is not possible, you end up with at tristate number that implies you're tracking $\\{ 0, 1, 2, 3 \\}$ instead. "
]
},
{
"cell_type": "markdown",
"id": "1a535bd6",
"metadata": {},
"source": [
"## Representing Tristate Number\n",
"\n",
"The Linux kerne's definition for tristate number can be found in [include/linux/tnum.h](https://github.com/torvalds/linux/blob/v5.18/include/linux/tnum.h), which uses two 64-bit unsigned integer for each tristate number of 64-bit.\n",
"```c\n",
"struct tnum {\n",
"\tu64 value; /* what the value are for the known bits */\n",
"\tu64 mask; /* which bits are unknown */\n",
" /* note: a bit cannot be set in both the value and the mask */\n",
"};\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "f52949f9",
"metadata": {},
"source": [
"## Addition\n",
"\n",
"Like the binary numbers used in computers, tristate numbers can undergo arithmetic operations as well. Let's take addition as an example.\n",
"\n",
"$$\\require{color}\\begin{align}\n",
"\\{ 1 \\} + \\{ 0, 1 \\} = \\left\\lbrace\\begin{array}\\\\1+0\\\\1+1\\end{array}\\right\\rbrace &= \\{ 1, 2 \\} \\\\\n",
"\\{ {01}_2 \\} + \\{ {00}_2, {01}_2 \\} = \\left\\lbrace\\begin{array}\\\\{01}_2+{00}_2\\\\{01}_2+{01}_2\\end{array}\\right\\rbrace &= \\{ {\\textcolor{red}{01}}_2, {\\textcolor{red}{10}}_2 \\} \\\\\n",
"{01}_t + {0?}_t & = {\\textcolor{red}{??}}_t \\\\\n",
"\\end{align}$$\n",
"\n",
"You can find Linux kernel's tristate number addition implementation in [kernel/bpf/tnum.c](https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/tnum.c#L62-L72), which looks like this.\n",
"\n",
"```c\n",
"struct tnum tnum_add(struct tnum a, struct tnum b)\n",
"{\n",
"\tu64 sm, sv, sigma, chi, mu;\n",
"\n",
"\tsm = a.mask + b.mask;\n",
"\tsv = a.value + b.value;\n",
"\tsigma = sm + sv;\n",
"\tchi = sigma ^ sv;\n",
"\tmu = chi | a.mask | b.mask;\n",
"\treturn TNUM(sv & ~mu, mu);\n",
"}\n",
"```\n",
"\n",
"Which doesn't really make that much sense to me. How do I even **know that it works**?"
]
},
{
"cell_type": "markdown",
"id": "a87b729a",
"metadata": {},
"source": [
"## Bugs?\n",
"\n",
"The most intuitive and straight forward way is to **write unit tests**, you check whether $\\{ 0 \\} + \\{ 0, 1 \\}$ indeed gives you back $\\{ 0, 1 \\}$, and repeat the same check for a few other different cases.\n",
"\n",
"If all the unit test passes, they you can be rather confident that `tnum_add()` above works.\n",
"\n",
"```c\n",
"struct t1 = tnum_const(0);\n",
"struct t2 = tnum_range(0, 1);\n",
"assert(tnum_add(t1, t2) == tnum_range(0, 1));\n",
"```\n",
"\n",
"However, you don't know if there are any edge cases hidden somewhere that may lead to bug, and checking **all possible cases** is too costly to implement and run, as there are $3^{64}$ tristate number (there a 3 possible state for each bit, either, `0`, `1`, or `?`; and there's 64 bits), which means to test all possible addition combination, you will need to go through $3^{64} \\times 3^{64} = 3^{128}$ (as addition takes two tristate numbers).\n",
"\n",
"Suppose you only need 1 second to write each test cases, doing so will still require $3.7 \\times 10^{53}$ years, good luck with that!\n",
"\n",
"Instead, we could achieve something similar with **model checking**."
]
},
{
"cell_type": "markdown",
"id": "b9e9ecb6",
"metadata": {},
"source": [
"## Model Checking\n",
"\n",
"I think compared to explaining how model checking works, it's understand it after you see it in action.\n",
"\n",
"(If you've used automated testing tools such as [Hypothesis](https://hypothesis.readthedocs.io/en/latest/) or [QuickCheck](https://hackage.haskell.org/package/QuickCheck) before, below will look somewhat familiar)"
]
},
{
"cell_type": "markdown",
"id": "693ae7d5",
"metadata": {},
"source": [
"First will need to build a **model of how tristate number works**, or at least the part that we want to prove; most notably we need to translate the `tnum_add()` from C into something the [model checker](https://github.com/Z3Prover/z3/) understands.\n",
"\n",
"Feel free to **skip over the model in the cell below**, just knowing that it's a rough translation of the [tristate number implementation used in the Linux kernel](https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/tnum.c) is enough.\n",
"\n",
"If you really, really want to understand what's going on, I'd recommend reading [some introduction on the Z3 solver](https://nbviewer.org/github/philzook58/z3_tutorial/blob/master/Z3%20Tutorial.ipynb) first."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dacd6b4f",
"metadata": {},
"outputs": [],
"source": [
"# WARNING! this will run the pip command on the system to install the z3 package\n",
"!pip install z3-solver"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6633ec12",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"from uuid import uuid4\n",
"from z3 import BitVec, BitVecRef, BitVecVal\n",
"\n",
"SIZE = 64 # Working with 64-bit integers\n",
"\n",
"class Tnum:\n",
" \"\"\"A model of tristate number use in Linux kernel's BPF verifier.\n",
" https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/tnum.c\n",
" \"\"\"\n",
" val: BitVecRef\n",
" mask: BitVecRef\n",
"\n",
" def __init__(self, val=None, mask=None):\n",
" uid = uuid4() # Ensure that the BitVec are uniq, required by the Z3 solver\n",
" self.val = BitVec(f'Tnum-val-{uid}', bv=SIZE) if val is None else val\n",
" self.mask = BitVec(f'Tnum-mask-{uid}', bv=SIZE) if mask is None else mask\n",
" \n",
" def contains(self, bitvec: BitVecRef):\n",
" # Simplified version of tnum_in()\n",
" # https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/tnum.c#L167-L173\n",
" return (~self.mask & bitvec) == self.val\n",
" \n",
" def wellformed(self):\n",
" # Bit cannot be set in both val and mask, such tnum is not valid\n",
" return self.val & self.mask == BitVecVal(0, bv=SIZE)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "71e4926e",
"metadata": {},
"outputs": [],
"source": [
"# The function that we want to check\n",
"def tnum_add(a: Tnum, b: Tnum):\n",
" # Unmodified tnum_add()\n",
" # https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/tnum.c#L62-L72\n",
" sm = a.mask + b.mask\n",
" sv = a.val + b.val\n",
" sigma = sm + sv\n",
" chi = sigma ^ sv\n",
" mu = chi | a.mask | b.mask\n",
" return Tnum(sv & ~mu, mu)"
]
},
{
"cell_type": "markdown",
"id": "d7039a2e",
"metadata": {},
"source": [
"We start by saying that there's two tristate numbers, `t1` and `t2`, as well as two integers, `x` and `y`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a11b9b81",
"metadata": {},
"outputs": [],
"source": [
"t1 = Tnum()\n",
"t2 = Tnum()\n",
"\n",
"x = BitVec('x', bv=SIZE)\n",
"y = BitVec('y', bv=SIZE)"
]
},
{
"cell_type": "markdown",
"id": "b6a4ecc8",
"metadata": {},
"source": [
"Now we build our premises. Which are the conditions that need to hold before we move to the actual check.\n",
"\n",
"Our premise is that:\n",
"- Both `t1` and `t2` are wellformed (i.e. valid) tristate numbers\n",
"- The set of numbers that `t1` can track contains `x`\n",
"- The set of numbers that `t2` can track contains `y`"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "10cbb26c",
"metadata": {},
"outputs": [],
"source": [
"from z3 import And, Implies, prove\n",
"\n",
"# Condition that needs to hold before we move forward to check tnum_add()\n",
"premises = And(\n",
" t1.wellformed(),\n",
" t2.wellformed(),\n",
" t1.contains(x),\n",
" t2.contains(y),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "987f5966",
"metadata": {},
"source": [
"And here the grand finale, ask the model checker to proved that `tnum_add(t1, t2)` will contain `x+y`, for **all possible combinations** of `t1`, `t2`, `x` and `y` (that matches our premises)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "edfbb724",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"proved\n"
]
}
],
"source": [
"from z3 import Implies, prove\n",
"\n",
"prove(\n",
" Implies(\n",
" # t1 and t2 is wellformed, x is in t1, and y is in t2\n",
" premises,\n",
" # Below is what we'd like to check\n",
" tnum_add(t1, t2).contains(x+y), # tnum_add() work as intended\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f544e794",
"metadata": {},
"source": [
"And that's it, I can sleep well at night knowing the `tnum_add()` work as intended!"
]
},
{
"cell_type": "markdown",
"id": "aca76a6a",
"metadata": {},
"source": [
"## Resources\n",
"\n",
"**This project is based on the existing work** of **[Sound, Precise, and Fast Abstract Interpretation with Tristate Numbers](https://arxiv.org/abs/2105.05398)** that use formal verification to prove besides addition, and multiplication, and division used for value tracking work as intended.\n",
"\n",
"If you think this project is interesting, I'd reccomend these as well:\n",
"- [Formal Methods for the Informal Engineer: Tutorial #1 - The Z3 Theorem Prover](https://www.youtube.com/watch?v=56IIrBZy9Rc) and its [accompanying notebook](https://nbviewer.org/github/philzook58/z3_tutorial/blob/master/Z3%20Tutorial.ipynb) is a great introduction into Z3\n",
" - Has a section specifically on [model checking](https://nbviewer.org/github/philzook58/z3_tutorial/blob/master/Z3%20Tutorial.ipynb#Symbolic-Execution-and-Bounded-Model-Checking)\n",
"- [Software Verification and Analysis Using Z3](https://research.nccgroup.com/2021/01/29/software-verification-and-analysis-using-z3/) a great example of using Z3 for model checking\n",
"- [[PATCH v5 net-next 00/12] bpf: rewrite value tracking in verifier](https://lore.kernel.org/lkml/ad840039-8d4a-b2a9-b2eb-a8f079926b53@solarflare.com/) - initial patch set that adds tristate number to the verifier"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@Werkov
Copy link

Werkov commented Aug 10, 2022

When I look at tnum_const function, the tnum.mask comment should read "which bits are unknown".

It'd be interesting to implement Tnum.popcount() (of the mask) and try proving sum.popcount() <= t1.popcount() + t2.popcount() + margin to see how tnum_add is efficient in conserving certainity.

(I hope that z3 prover doesn't use same tnums implementation internally ;-).)

@shunghsiyu
Copy link
Author

shunghsiyu commented Aug 11, 2022

When I look at tnum_const function, the tnum.mask comment should read "which bits are unknown".

@Werkov thanks for spotting the mistake! Now fixed.

It'd be interesting to implement Tnum.popcount() (of the mask) and try proving sum.popcount() <= t1.popcount() + t2.popcount() + margin to see how tnum_add is efficient in conserving certainity.

Very interesting indeed, though I wonder if I should eliminate cases where it overflows, otherwise a Tnum(val=-1, mask=0) + Tnum(val=0, mask=1) == Tnum(val=0, mask=-1) worst-case might kill all the fun.

Another ideal would be to brute-force through all the combinations at a lower size (i.e. bandwidth), maybe 16-bit instead of 64-bit, then plot some thing like a closure plot (using the posits paper as an example here)

multiplication closure of posits vs floating-point

In Sound, Precise, and Fast Abstract Interpretation with Tristate Numbers, they used brute-forcing to compare the precision between their proposed multiplication algorithm and the original tnum_mul() in the Linux kernel.

(I hope that z3 prover doesn't use same tnums implementation internally ;-).)

I hope not :P

@shunghsiyu
Copy link
Author

Another ideal would be to brute-force through all the combinations at a lower size (i.e. bandwidth), maybe 16-bit instead of 64-bit, then plot some thing like a closure plot (using the posits paper as an example here)

Scratch that... I cannot represent tnum on a single number line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment