Skip to content

Instantly share code, notes, and snippets.

@rot256
Last active April 20, 2021 09:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rot256/174fd53c0aac8cf04ef9810e8a10b0c0 to your computer and use it in GitHub Desktop.
Save rot256/174fd53c0aac8cf04ef9810e8a10b0c0 to your computer and use it in GitHub Desktop.
Cross-repetition bit-slicing

Cross-repetition bit-slicing

Recall that in KKW all the players essentially execute the same operation in parallel, e.g. in the case of addition, they all add their shares.

The new observation is that this is obviously also the case if you consider the operation of all the players across all the repetitions.

Fix the number of players n = 8 and the ring as R = GF(2) (bits).

Then I can represent the shares of 8 players in 8 instances of KKW as a single 8 x 8 = 64-bit register:

S_x = | s_{1,1}, s_{1,2}, ..., s_{1,8} | s_{2,1} ... | ... | s_{8,1}, ..., s_{8,8} |

Addition

If I need to have the players add the wire "x" with the wire "y" across all 8 repetitions, I can do this by simply xoring the 64-bit register for wire "x" and "y". This will individually add/xor the share of each player in each repetition.

Reconstruction

Reconstruction can also be bit-sliced very efficiently Let S_x be the 64-bit integer:

Using a logical shift:

t = (S_x >> 4) ^ S_x)
t = (t >> 2) ^ t
t = (t >> 1) ^ t

Using a rotation:

t = (S_x ROL 4) ^ S_x)
t = (t ROL 2) ^ t
t = (t ROL 1) ^ t

Yields the reconstructed value in the least significant bit of each byte and repeated 8 times in each byte, respectively.

Multiplication

Beaver multiplication of shared values [x] and [y], given a Beaver triple [a], [b], [c] (with a * b = c) requires that the players compute:

[d] = [x] - [a]
[e] = [y] - [b]

[z] = [c] + [x] * e + [y] * d - e  * d

The first two are just addition (done using 64-bit XOR), done as above. The last requires:

  1. reconstructing d and e.
  2. computing e * d can be done using a single 64-bit AND.
  3. computing [x] * e and [y] * d can be done as follows: Recall that using the reconstruction (with ROL) of S_x we get a register t holding a 64-bit integer where the reconstructed bit is repeated in every byte. Hence [x] * e can be computed using a single bit-wise AND: S_x & t
[d] = [x] - [a] : 1 ins. (64-bit XOR)
[e] = [y] - [b] : 2 ins. (64-bit XOR)

Recons e        : 6 ins. (using ROL ensuring the result is repeated across all bits)
Recons d        : 6 ins. (using ROL ensuring the result is repeated across all bits)

[x] * e         : 1 ins. (using AND)
[y] * d -       : 1 ins. (using AND)
e  * d          : 2 ins. (using AND and masking AND 0x010101...01 to only take one copy of the result)
[z] = [c] + [x] * e + [y] * d - e  * d : 3 ins. (XOR)

Meaning a total of ~ 22 AMD64 cycles per multiplication across 8 repetions, ignoring cryptographic ops, cache misses etc.

Efficiency

Since we are in running 8 repetions in parallel, we are in effect "doing 8 times fewer repetions" by packing them in batches of 8. Since we are doing ~250 repetions atm. this gets down to ~32 a significant speedup.

Ignoring the cost of invoking the PRG and looking up wires, using the technique above would mean we can prove:

  • Mult gate: using ~ 700 cycles on AMD64.
  • Addition gate: using ~ 32 cycles on AMD64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment