neizod/break_nsa.rst

## break_nsa.rst

      
    Raw
  

              break_nsa.rst
            
          
    Break the NSA Encryption

Challenge on Gist from this Facebook Post provide an interesting question on cracking the NSA encryption. Doesn't this hook you enough? Let's roll. :D
Note, you can also test the validity of logic in this file with the command:
$ python3 -m doctest break_nsa.rst --verbose
Preamble

This section is just a math function/constant defined for easy to use later.
Even though, you should not proceed until you know how it works? Why it necessary?
>>> from collections import Counter
>>> from itertools import product as cartesian_product

>>> def singleton(f):
...     return f()

>>> @singleton
... def primes(bound=10**6):
...     sieve = [True] * bound
...     sieve[0] = sieve[1] = False
...     cur = 0
...     while cur < bound:
...         if sieve[cur]:
...             for idx in range(2*cur, bound, cur):
...                 sieve[idx] = False
...         cur += 1
...     return [idx for idx, is_prime in enumerate(sieve) if is_prime]

>>> def factorized(number):
...     if number == 1:
...         yield 1
...     for prime in primes:
...         if prime**2 > number:
...             break
...         while number % prime == 0:
...             yield prime
...             number //= prime
...     if number > 1:
...         if prime**2 > number:
...             yield number
...         else:
...             raise ValueError('Number too big to be factorized.')

>>> def product(numbers, initial=1):
...     for number in numbers:
...         initial *= number
...     return initial

>>> def divisors(factors):
...     factor_set = ({p**i for i in range(k+1)} for p, k in Counter(factors).items())
...     return sorted(product(c) for c in cartesian_product(*factor_set))
Introduction

Let There Be Let


Message C is a cipher text that we (sneaky) see it (naughty you :P).
Message P is a plain text that we want to find.
E is a encrypt function that translate P -> C, in other word E(P) = C.
D is a decrypt function that works in reversed way of E. D(E(P)) = P.
K is a key to make function E and D stronger. It can be use in addition to those function as E(P, K) = C.

What We Already Known

Message C is what we got, it is a number repr in base 10 of length 609 chars.
>>> C = 434384569820012709749978085023147407174684824178941182826833799495931410023794845922767533429746537016995520506439457550763575993604402054742042654701475990703513534158579743446096171193503041071008550601683001839024513922875537448251544812606790879783442587227277601837740236112564627038091170167413493638174350319534049389495771259593223970367601200671312435588244337256711215093040169407379877379400242759675821842258110296156050808143302683071999772732555638568114446076107646435540182476596386155629693774180289540430199704239923354089531930534807431109632462125230336414045829798557815327441670222304200
We precisely know how E works: Let message P has length of N chars (index of each char in P starts from 1 to N for convenient). Then C can be computed with:
E(P, K) = ord(P1) K^n + ord(P2) K^(n-1) + ord(P3) K^(n-2) + ... + ord(PN) K^1
Where ord is a function that maps ASCII char to integer.
>>> def E(P, K):
...    C = 0
...    for p in P:
...        C += ord(p)
...        C *= K
...    return C
From here, we can easily see that the decrypt algorithm is:
>>> def D(C, K):
...     P = ''
...     while C:
...         C //= K
...         P = chr(C%K) + P
...         C -= C % K
...     return P
Speculation

Key range

Assume original message P has length of 32 chars. Now, for simplicity of concrete example, let just assume that all ord(p) in P is 1. By analyzing the encryption function, we see that:
K^32 + K^31 + K^30 + ... + K^1 = C
Since K^32 is very much larger than Sum K^i for i from 1 to 31, we may say that:
K = root(C, base=32)
Approximate result from this roughly calculation is:
K = 10470 * 10**15
Note: this is not the final answer, just compute it to see how large the number can be.
Assume original message P is also has format of MD5, hence only ASCII char that can represent hex allowed. Which tells us that valid char range is 0-9, a-f, and A-F. These translated with ord function to the values of 48-57, 97-102, and 65-70, respectively. Consider worst case with the lower a-f, each ord(p) from P can be ranged from 48-102.
Assume every char in P is '0', its ord value is 48, and Ku is an upper bound of the key:
48 Ku^32 + 48 Ku^31 + ... 48 Ku^1 = C
Ku = root(C/48, base=32)
>>> Ku = 9280 * 10**15
Assume every char in P is 'f', its ord value is 102, and Kl is a lower bound of the key:
102 Kl^32 + 102 Kl^31 + ... 102 Kl^1 = C
Kl = root(C/102, base=32)
>>> Kl = 9060 * 10**15
So basically, if you select a key K that is not satisfy with Kl <= K <= Ku. Then we will never got the answer because E(P, K) != C for every possible P.
Possible Keys

Take a look at the encrypt function, we see that:
E(P, K) = ord(P1) K^n + ord(P2) K^(n-1) + ... + ord(Pn) K^1
        = (ord(P1) K^(n-1) + ord(P2) K^(n-2) + ... + ord(Pn) K^0) K
This may be the weakest link of the function that allow us to attack. It tell us that K must be divisor of C. In other word C % K == 0.
We need prime number ups to the largest possible K, Ku = 9.280x10^18. Unfortunately, computation that much primes is impossible nowadays. The best we can do here is to find as much factors as possible.
>>> it = factorized(C)
>>> factors = []
>>> try:
...     while True:
...         factors += [next(it)]
... except ValueError:
...     pass

>>> factors
[2, 2, 2, 3, 5, 5, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 41, 106903]
The remaining number may be a large final prime, or may be it can be factorized further. We can ignore this number, as long as every factors from it is larger than Ku (we don't know whether this hold, but let's say it do). In case you wonder, that giant monstrous number is:
1198162896844627574951254304167077865137441451269873420210823707907632078858692758046877232069029631575824857410083756805246859463940310819141386606731750942905572044682630074951510603667141665966096621398351556391207139334515227387293097732317039812158636295421558919175442502050687175932172864893273289106253699350521598092326361993640396143041026264695563620680179489564427544445897926837177366506825782997292755798068057150593688470132473887043210442755344189289139842977367860902098509084703400379089529999185583555055563981243234518434795150565878268478914709009755243001770283983741
Next is to find all of the divisors of C, for example:

2 is a divisor of C, since 2 is one of its factors.
2^3 = 8 is a divisor of C, since 2 occur 3 times in the factors.
2^2 x 13 x 41 = 2132 is also a divisor of C, same idea as above.

Then we will filter only divisors that falls in the key range.
>>> Ks = [k for k in divisors(factors) if Kl < k < Ku]

>>> Ks
[9063554107792192905]
Luckily enough, only one possible key shows up here! So now we'll take that key as the key to the answer. (If this key failed, then more brain power to be put in this :p)
Result and Discussion

Unfold to Plain Text

Now we have a key K, it's time to retrieve original message P.
>>> K = Ks[0]
>>> D(C, K)
'e0d00b9f337d357c6faa2f8ceae4a60d'
The Real Answer

Try hashing every words in your dictionary, if you want to have some more fun (and make you feel like you just conquer the hacker's world). Otherwise just ask rainbow table anywhere on the internet.
>>> from hashlib import md5
>>> md5('cryptography'.encode()).hexdigest() == D(C, K)
True
Summary

By best practice, the key must be large prime in order to be uncrackable. However, we're very lucky that 1.) the key is a composite number, 2.) we know the length of the original plain text 3.) the character space used in plain text is very narrow. This flaws allow us to break that code in no time, though some aspect of the methodology may looks weird and unscalable. Thanks for a good puzzle. ;)