Skip to content

Instantly share code, notes, and snippets.

@emboss
Created January 11, 2012 22:40
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save emboss/1597215 to your computer and use it in GitHub Desktop.
Save emboss/1597215 to your computer and use it in GitHub Desktop.
Hashed key as IV vs. KDF
I looked into http://grothoff.org/christian/esed.pdf when I found that
they produce a RIPEMD-160 hash to generate a key from 128 bits and take
the rest for the IV.
You could use a similar approach to generate key and IV where the IV is
independent (somewhat) of the key by using a non-salted key derivation
function that is normally used in Diffie-Hellman-like Key Exchange
protocols. They are used to generate arbitrary-length output from an
initial fixed-size output. (see the KDFs in http://www.di-mgt.com.au/cryptoKDFs.html)
The salt is not needed in our case, since the underlying data (the
file hash) probably provides enough entropy/randomness.
So to keep the key out of IV generation you could generate for example
48 bytes, 32 to be used for the key and the rest would serve as the IV.
The "shared secret" input for the KDF would simply be the SHA-256 hash
of the file, computed like before.
Although I think your scheme is fine, I could think of one possibility
how somebody could exploit the fact that the key is involved in
IV generation: They could use an attack as described in
http://www.cs.berkeley.edu/~daw/my-posts/key-as-iv-broken - don't know
if it's actually feasible in this scenario, but let's assume it were -
then the attacker could obtain at least the plain IV, which right now is
the hashed key.
Now since the hash is not salted, it would probably be easier to brute-force
the SHA-256 hash than to brute-force the AES key itself, because verifying
the hash is the faster operation. That's why I think from a purist view the
"KDF thing" might make sense, but then again, brute-forcing SHA-256 is not
a trivial task, either :)
But enough nit-picking, congrats on this cool project!!
@tarcieri
Copy link

I will look through KDF in depth more, but just at a cursory glance, I had another idea. Right now

key = hash(plaintext)
iv = hash(key)

What if instead?

key = hash(fixed_salt1 + plaintext)
iv = hash(fixed_salt2 + plaintext)

where the salts are e.g.

fixed_salt1: "\0" * 32
fixed_salt2: "\255" * 32

@emboss
Copy link
Author

emboss commented Jan 12, 2012

Yes, I had the same idea :). It's also a good solution, but it reminded me of password schemes - there the argument is that with a fixed global salt it's still feasible for resourceful attackers to precompute enough hashes in order to mount a succesful attack. But on the other hand, there you also have the reduced entropy of passwords, so it's a different situation again. But then I thought of the KDF - it has the additional benefit of keeping things unpredictable and per user.
That's what I liked about the current IV, it has the same properties. It's only that I always get paranoid when I use the key in places where it wasn't intended to. It's not that I could say it's bad, because besides the artificial thing I described above I have no real evidence, but I always try to be as defensive as possible with the "secret", trying to expose it as seldom as possible. I don't know, maybe nahi has some thoughts on it?

@tarcieri
Copy link

What if an additional proof-of-effort component were added to prevent rainbow table-style attacks on small files?

key = bcrypt(hash(fixed_salt1 + plaintext))
iv = bcrypt(hash(fixed_salt2 + plaintext))

I suppose there's the issue of where to derive the bcrypt salt then :(

I will give KDF a more thorough look

@emboss
Copy link
Author

emboss commented Jan 12, 2012

Yes, it's better because it puts more load on the attacker, but the salt is still a fixed value :)

I like the KDF thing because it reminds me of what you do in public key crypto key exchanges, where two parties derive a key from a shared, fixed-length secret, so I felt this was pretty similar to that situation. But no guarantees unless thousands of cryptographers look at it and approve of it :P

@emboss
Copy link
Author

emboss commented Jan 12, 2012

If you want to look at KDFs further, I can recommend NIST's introduction in http://csrc.nist.gov/publications/nistpubs/800-56A/SP800-56A_Revision1_Mar08-2007.pdf. The RSA PKCS#5 paper is also pretty good, although PBKDF2 introduced there requires a salt, but they have a lot of good background info.

@tarcieri
Copy link

I'm confused how KDF helps in a way that the method I outlined will not. Both of these approaches depend entirely on the entropy in the file in question. Shouldn't that make them equally susceptible to a rainbow-table style attack?

The more I think about this, the less I think using the system to exchange small files containing highly sensitive information is even interesting. Small, password/key-sized files can be directly exchanged using any number of secure mechanisms (e.g. pgp/gpg, OTR). The Cryptosphere is going to be doing this all the time by exchanging symmetric keys via asymmetric key crypto, and I'd like to provide built-in social mechanisms that allow users to securely exchange keys to files/directory structures with each other.

That said, even if this isn't a practical use case, we can still try to envision a potential attack on low-entropy files and try to design a system which would defend against it:

  • Bob wants to send Alice a password stored as a file in the Cryptosphere
  • Bob computes the key/iv for the file using whatever key derivation algorithm
  • Bob encrypts the file, computes the public hash, and makes the file available to the system
  • Bob, as part of the Cryptosphere's normal operation, decides to share this file with another peer, let's call him Steve
  • Steve obtains the public ID of the file, which is hash(block_cipher(key_derivation(plaintext), iv_derivation(plaintext), plaintext))

Steve happens to have access to a large GPU farm and has already built a rainbow table by iterating through all short byte sequences, calculating the public ID using the algorithm above, and creating a large associative map of all public IDs to their corresponding plaintext.

Steve is able to take the public ID for the file he obtained from Bob, look it up in the rainbow table, and discover the corresponding plaintext.

To me, this is the real threat. However, if we were to incorporate a proof-of-work function into the process of deriving the original key and IV, this would severely hamper the ability for Steve to be able to generate rainbow tables. The amount of work done by the proof of work function could be incredibly significant... imagine something like BCrypt but with the work factor turned up much, much higher than a typical password.

Let's say we use BCrypt with a work factor that ensures the amount of computation required to successfully encrypt a file takes approximately 1 second on a modern computer (or 0.5 seconds * 2 as we'd be bcrypting both the key and the iv), and that Steve wants to build a rainbow table of all plaintext files which are 8 characters or smaller. It would take Steve...

http://www.wolframalpha.com/input/?i=2%5E64+seconds

584.9 billion years JUST to do the BCrypt calculations in order to derive the public IDs for all possible 8-character plaintexts (in the rather unlikely scenario he continues to use the same computer for 584.9 billion years)

There's still the issue of deriving the BCrypt salt, and perhaps I could use KDF for that, but to me the real solution here is to incorporate a proof-of-work problem when generating the key and iv used to encrypt the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment