Skip to content

Instantly share code, notes, and snippets.

@davehull
Last active March 30, 2022 14:15
Show Gist options
  • Save davehull/7ab911720d895a6dae18 to your computer and use it in GitHub Desktop.
Save davehull/7ab911720d895a6dae18 to your computer and use it in GitHub Desktop.
eatoin shrdlu: XOR Encryption and Hamming Distance
I've been playing around with Matasano Crypto Challenges for my own edification.
It's been fun and insightful. I've learned a number of new things and enjoyed
doing so. If you're a mediocre programmer like me and have an interest in crypto,
I highly recommend checking out the challenges -- http://cryptopals.com/.
A few of the exercises in set 1 have you playing around with XOR for encryption.
You create a script that can brute force single key decryption and if you're
ambitious you'll write a function that will examine letter frequencies of the
output and score the results, returning the one that is most likely to be
English. I wrote multiple scoring functions for this, one that counts English
letter frequencies, one that counts English bigram frequencies, one for trigram
frequencies and lastly one that calculates Shannon Entropy. The output of these
functions all make up the composite score.
Later you have to write a XOR brute force decryption program that first applies
Hamming Distances at the bit level, which equates to calculating the Index of
Coincidence over the ciphertext, all in an effort to determine the size of the
XOR key used to encrypt the data. Here I stumbled a bit. First I didn't pay
careful enough attention and missed that the given ciphertext was base64 encoded.
I also didn't get the math quite right for building my byte arrays on which the
Hamming Distance would be calculated. But I eventually resolved those isses
after a day of banging my head on the desk trying to figure out what the issue
was.
Now I have a XOR brute force tool that can reliably calculate key size... well,
to an extent. These calculations are about probabilities not certainties, but
one thing I noticed right away was that longer keys are easier to calculate
with greater certainty, while shorter keys are more difficult. Here's some output
from my script that will show what I mean. I found it fascinating, but I've also
not been getting sufficient sleep of late -- thank you Matasano.
First, let's set a variable for our plaintext:
$data = "Dennis Fisher talks with Dan Kaminsky about the VENOM bug, the value of
virtual machine escapes, why everyone wants to make every bug the worst one of
all time or just a bunch of hype and what the Avengers have to do with
vulnerability disclosure."
Next, let's set our key:
$key = "eatoin shrdlu"
$key.length
13
Let's XOR encrypt our plaintext with the key and get $cipherText:
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key
$cipherText (line breaks added)
21041a01001d003501010c090745151503021d000401060c4c31040f54240803491d1b191d4c14
070e011b491a4816482421223a2841161a0e4200070017441a140914114f060800050100101914
0941190e0a06491d0d52011f160411111c454e571b1152011a1017181b010c4e57120606174c01
0a41190e020b00161e171615550714134f1d0645531f1d161f01450e1a0a49014653091e084c01
0c0c114f061c00191d01104c14450301010a06001c0e520c1505004115010d4e571b090644181d
004135190c0047161a014404141304541b064e441c48050d181d45170103070b52120a1b080501
1c4110061a0d4c1c1b0716095b
Let's run this through the XOR brute forcer to see what it says about the key
size:
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize
KeySize AvgDist
------- -------
13 2.74660633484163
39 2.78461538461538
26 2.81730769230769
33 2.8989898989899
40 2.905
25 2.93
38 2.94210526315789
34 2.94607843137255
35 2.95714285714286
29 2.98522167487685
37 2.99459459459459
27 3
28 3
30 3.00952380952381
31 3.01075268817204
23 3.01449275362319
8 3.01724137931034
18 3.02314814814815
22 3.03181818181818
36 3.03888888888889
20 3.04090909090909
10 3.04347826086957
24 3.0462962962963
21 3.05238095238095
14 3.05357142857143
32 3.07291666666667
12 3.10087719298246
19 3.10526315789474
15 3.10666666666667
17 3.15837104072398
16 3.21875
11 3.23376623376623
9 3.23931623931624
5 3.25416666666667
4 3.42916666666667
7 3.44957983193277
6 3.525
3 3.82716049382716
2 4.24590163934426
Beautiful! The result correctly calculates that the key size is 13 bytes, but
watch what happens with smaller keys.
$key = "eatoin shrdl"
$key.length
12
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize
KeySize AvgDist
------- -------
24 2.58333333333333
36 2.66111111111111
38 2.85789473684211
40 2.915
19 2.95215311004785
29 2.97536945812808
22 3.00909090909091
16 3.02232142857143
23 3.03381642512077
18 3.03703703703704
37 3.03783783783784
14 3.05357142857143
30 3.05714285714286
32 3.0625
27 3.06481481481481
12 3.06578947368421
28 3.06632653061224
25 3.08
33 3.08585858585859
13 3.1131221719457
9 3.11965811965812
31 3.12365591397849
11 3.13419913419913
35 3.13809523809524
34 3.15196078431373
20 3.15909090909091
26 3.16826923076923
39 3.18974358974359
17 3.19004524886878
21 3.1952380952381
10 3.2
15 3.24444444444444
8 3.26293103448276
5 3.35416666666667
6 3.425
7 3.50840336134454
4 3.56666666666667
3 3.9917695473251
2 4.61475409836066
Our top result is not 12, but notice the top two results are multiples
of 12 and 12 is significantly higher in the rankings than any key
lengths less than 12. Let's try a shorter key:
$key = "eatoin shrd"
$key.length
11
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize
KeySize AvgDist
------- -------
33 2.69191919191919
22 2.70909090909091
11 2.81385281385281
18 2.97222222222222
38 2.98421052631579
23 3.0048309178744
28 3.06632653061224
15 3.07111111111111
39 3.07179487179487
24 3.0787037037037
16 3.09821428571429
31 3.10215053763441
37 3.11351351351351
32 3.11458333333333
36 3.11666666666667
40 3.12
17 3.12669683257919
13 3.12669683257919
26 3.12980769230769
35 3.13809523809524
21 3.13809523809524
34 3.15686274509804
20 3.15909090909091
29 3.17733990147783
25 3.18
30 3.18571428571429
12 3.18859649122807
27 3.19444444444444
14 3.20089285714286
10 3.20434782608696
8 3.27155172413793
19 3.27272727272727
9 3.2948717948718
7 3.47058823529412
6 3.59583333333333
5 3.62083333333333
4 3.65416666666667
3 4.12757201646091
2 4.20491803278689
Again our correct key size is not the top result, but the two top
results are multiples of the correct result and the correct result
follows immediately. What of a smaller key size?
$key = "eatoin sh"
$key.length
9
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize
KeySize AvgDist
------- -------
36 2.66111111111111
27 2.75925925925926
31 2.88709677419355
9 2.8974358974359
32 2.91666666666667
38 2.92105263157895
23 2.93719806763285
22 2.95909090909091
29 2.97044334975369
18 2.99074074074074
40 3.025
37 3.02702702702703
14 3.05803571428571
30 3.07619047619048
28 3.0765306122449
20 3.08181818181818
35 3.09047619047619
34 3.10294117647059
17 3.10407239819005
26 3.11538461538462
16 3.11607142857143
15 3.12
13 3.13122171945701
25 3.16
10 3.17391304347826
21 3.2
39 3.20512820512821
33 3.20707070707071
19 3.21052631578947
11 3.22077922077922
24 3.25925925925926
5 3.37083333333333
7 3.39915966386555
12 3.41228070175439
8 3.42672413793103
6 3.45416666666667
4 3.57916666666667
3 3.98765432098765
2 4.45491803278689
Again, not the top result, but the top two are multiples and the
correct result follows closely. This pattern continues for small
keys.
$key = "eatoin s"
$key.length
8
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize
KeySize AvgDist
------- -------
24 2.58333333333333
40 2.68
32 2.72916666666667
16 2.9375
36 2.98333333333333
39 2.98974358974359
20 3.00454545454545
21 3.00952380952381
30 3.01428571428571
34 3.01960784313725
33 3.02020202020202
28 3.04081632653061
8 3.04741379310345
12 3.08333333333333
38 3.11052631578947
29 3.14285714285714
31 3.16129032258065
25 3.17
27 3.17592592592593
37 3.18918918918919
18 3.19444444444444
23 3.19806763285024
22 3.22272727272727
19 3.24401913875598
14 3.25446428571429
35 3.26190476190476
17 3.28054298642534
11 3.28571428571429
15 3.29777777777778
13 3.30769230769231
26 3.31730769230769
9 3.32905982905983
10 3.35652173913044
6 3.36666666666667
7 3.59663865546219
4 3.61666666666667
5 3.6375
3 3.98353909465021
2 4.68852459016393
$key = "eato"
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize
KeySize AvgDist
------- -------
24 2.58333333333333
28 2.64285714285714
36 2.66111111111111
40 2.68
32 2.72916666666667
20 2.79545454545455
30 2.85714285714286
34 2.88725490196078
16 2.9375
29 2.94581280788177
38 2.94736842105263
39 2.94871794871795
26 2.96153846153846
22 3.00909090909091
35 3.01428571428571
33 3.02020202020202
14 3.02232142857143
27 3.02777777777778
8 3.04741379310345
12 3.06578947368421
31 3.0752688172043
13 3.09049773755656
15 3.09333333333333
21 3.09523809523809
37 3.11891891891892
25 3.12
23 3.1207729468599
18 3.14814814814815
17 3.17647058823529
11 3.17748917748918
10 3.2
19 3.21531100478469
6 3.25
9 3.28632478632479
4 3.31666666666667
7 3.31932773109244
5 3.57083333333333
3 3.77366255144033
2 4.33196721311475
What's the point of posting this? I guess the take away is that if
you're trying to brute force XOR keys by first determining the key
size, you can't automatically take the top result to be the correct
key size. Even Matasano's instructions state that this is about
probability not certainty, but the interesting property that they
don't discuss, probably because they want people to notice on their
own, is that for short keys, the top results may be multiples of
the actual key size, so be mindful of that pattern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment