Using zkMIPS a client can outsource a program written using the MIPS instruction set, have it executed, and, on top of getting the result, receive an easily verifiable proof that this result is correct. The software component that creates this proof is called a zkVM.
The input to the zkVM is the execution trace of the program execution. Since a zkMIPS computation happens in steps, its overall state can be defined by the values of a finite list of variables. A valid computation is a sequence of states, from a well-defined initial state to some final state, in which each state transition represents a valid computation step. This sequence can be represented as a table whose columns represent the list of variables and whose rows represent each step of the computation. The following diagram shows a toy example.
SAME IMAGE AS EARLIER POST
So to prove that the result of a program is correct, one has to prove that all transitions from each state (line) to the next is valid, i.e. is consistent with the opcode of that particular instruction of that program step. And so, for each different instruction this consistency proof is different, resulting in an arduous task for zkVM software developers.
However, there is good news. Last week, Justin Thaler and Srinath Setty, together with Arasu Arun and Riad Wahby published a new, more efficient approach to perform these consisteny checks. They published two papers presenting two complementary systems: Jolt and Lasso. To understand their approach it is easiest to start with Jolt.
We all know that a Boolean function can be represented by a truth table:
x | y | z = x AND y |
---|---|---|
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
Instead of bits, we could use bytes, to get a table like
x | y | z = x AND y |
---|---|---|
00000000 | 0000000 | 00000000 |
00000000 | 0000001 | 00000001 |
11111111 | 1111110 | 11111110 |
11111111 | 1111111 | 11111111 |
Note that we have
If instead of bytes, we use 32 bit words, the overall table size will be
This number,
Under this assumption we can implement a (bitwise) AND between
Note that zk proofs are slightly different. We are not computing
Now the opcode we used in this example is AND. But it is clear that we can represent any other opcode in a table: bitwise OR, bitwise XOR, ADD, SUB etc. Each opcode will have a different corresponding table, of course.
The Jolt paper does not just discuss opcodes. It presents an abstract, but equivalent model of a CPU, and proves that each instruction of its Instruction Set Architecture (ISA) can be implemented using table lookups, assuming the existence of gigatables.
The details of Jolt are sometimes a bit tedious, but nothing highly abstract or mathematical. It has the flavor of CPU design, but not using the tools of an electrical engineer, but using the specific building blocks that the Lasso model offers.
To give an idea of the topics that Jolt discusses, here are a few examples:
-
To verify an AND opcode on two 64-bit words, one doesn't need a gigatable. Instead, one can split each word in 8 bytes, and apply 8 bitwise ANDs, on each pair of bytes. This works, because in an AND opcode the bit positions are indepent.
-
To verify an ADD opcode, one can take advantage of the addition in the finite field, provided that its size is larger than
$2^{64}$ . Also, one must deal with overflow exactly the way that the CPU does. -
Division is an interesting case. If
$z = DIV(x,y)$ , Jolt will not try to replicate this computation. Instead it will verify that$x = z \cdot y$ , and provide a proof thereof. More precisely, if also$w = MOD(x,y)$ , then Jolt will verify the quotient-with-remainder equation$x = z*y + w$ . -
In order to speed up verification, we are allowed to add as many extra variables (registers) as we like, as long as they don't interfere with the CPU's logic. BTW, floating point operations are not provided.
The Jolt paper discusses all instructions included in the CPU's ISA, providing a detailed description of how each instruction can be reduced to table lookups, some of them simple, others using gigatables.
The real hard part of this approach is the implementation of these gigatables. Most values in these tables will never be accessed, they exist only conceptually. How this ispossible is described in the companion Lasso paper; this will be the topic of our next post!