Skip to content

Instantly share code, notes, and snippets.

@RealityRipple
Created January 19, 2020 08:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save RealityRipple/a32f2192501f4775aff36ce143ac6894 to your computer and use it in GitHub Desktop.
Save RealityRipple/a32f2192501f4775aff36ce143ac6894 to your computer and use it in GitHub Desktop.

Additions to AES Encryption in Zip Files

I. Foreward

AES Encryption is considered a safe encryption algorithm. However, the implementation of AES as described by the Zip file standard (see https://www.winzip.com/win/en/aes_info.html) is sorely lacking in implementation.
The issues with the implementation are not in the execution of AES-CTR itself, but with the PBKDF2 constants used by all existing compatible archiving software. Below is my attempt to improve upon the existing Zip-AES standard as best as possible without interfering with the parent Zip standard or with software that reads or writes Zip files.

II. New AES Encrypted file storage format

A. File Format

All fields are identical to the existing file storage format, and the two new fields follow the existing (little-endian) standard.

Size (bytes) Content
Variable Salt value
2 Password verification value
Variable Encrypted file data
1 PBKDF2 Hash Type
8 PBKDF2 Rounds
10 Authentication code
B. PBKDF2 Hash Type

The Hash Type value is a single byte that can be any number between 0 and 255... eventually. For now, only the following types are defined:

  1. HMAC-SHA-1
  2. HMAC-SHA-256
  3. HMAC-SHA-384
  4. HMAC-SHA-512

The Hash Type should only be used for PBKDF2; the Authentication code should remain as HMAC-SHA-1 in all cases, as it is a matter of fidelity, not security. This also allows existing software the possibility of being able to validate the Authentication code, if desired (see Section III, part C: "Compatibility").

This value is stored unencrypted.

C. PBKDF2 Rounds

This value, quite simply, is the number of rounds to be used in PBKDF2, expressed as a little-endian 64-bit unsigned integer. It is recommended that this value be greater than 4,000 at the very least, if not closer to 100,000 (as of early 2020). Since this value sticks to the file, not the program, a decent suggestion would be to shoot for as many rounds as you can within a user-friendly timeframe.
For example, you could try generating a PBKDF2 key using n rounds, then divide n by the time it took to calculate the key (in milliseconds) and multiply that by 1000. The result would be a decent average number of rounds per second (per encrypted file, not per Zip file) that would scale upward with faster hardware without requiring code changes. If a full second of the application being locked is too long for your purposes, you can multiply n/time by whatever number of milliseconds you find appealing, of course.

This value is stored unencrypted.

III. Notes

A. Rationale

The two elements included in this modification to the standard are the HMAC Hash Type and iteration count for PBKDF2. By default, these values are, of course, SHA-1 and 1000, respectively. However, the biggest security flaw in this is that these values are constants. Not because they should be unknown, but because they should be modifiable to keep up with the changes in hardware capability. It may be advisable in the future to even include a variable for the key generation type rather than only allowing PBKDF2,as it seems likely that other possible standards using similar inputs will become available as technology advances further.
In this decade, however, PBKDF2 is still the best option, if it gets a few little improvements. Those improvements become possible through the PBKDF2 Hash Type and Rounds values added above in 9 bytes. Note that while unlikely, a full 64-bit value has been provided for the number of rounds. I have no clue how long PBKDF2 will last, and even if I did, guessing at the number of iterations considered "safe" at any given time would be a fool's errand, especially if new hashing algorithms are implemented. I've also provided 251 free spaces for those potential other Hash Types to use. If the time comes that all of those are used, then someone really got a little too liberal with which hash types to include.

B. Extra Space Usage

These additions will only increase a ZIP file by 9 bytes per included file, which is some decently small overhead, given the amount of added security. Additionally, the forward-compatible nature of the changes means that any future changes will take no extra storage space beyond these 9 bytes per file. Additionally, the number works very well with the possible Salt lengths, as any leftover size in the content beyond the known sizes must be greater than 16 even in the case of AES-128.

C. Compatibility

Storing these extra elements in the "compressed data" AES header means that the changes will not break or inhibit existing archive tools. While tools that use the standards will not be able to decrypt files encrypted with this format, they will be able to add their own files to archives that do include these modifications without getting in the way. Additionally, no changes to the general flags or compression type value means that the files will be detected as AES-encrypted just like a normal AES-encrypted file would within the archiver. Hopefully, the additional placement of the optional data between the file data and the final 10 byte MAC means that a parser could even gloss over the extra 9 bytes if they happen to look for "the final 10 bytes" instead of "the next 10 bytes", and even verify the MAC.

D. Parting Words

If you would like to implement these changes, consider them public domain. No credit or license of any type is attached here.
If you wish this would become standardized, join the club. Redbubble can make us all T-shirts.

@RealityRipple
Copy link
Author

Can you expand on why the PBKDF2 constants are an issue? Are you claiming that this issue makes Zip insecure when using AES-256?

The encryption method (such as AES-256) is irrelevant. This is to prevent brute-forces and rainbow tables that target PBKDF2 and its output.

When PBKDF2 was released over 2 decades ago, 1000 rounds of SHA-1 was considered sufficient. The way it works is simple: whatever password you enter is hashed with SHA-1 1000 times. Knowing this, you can enter the n most common passwords into a PBKDF2 generator and make a rainbow table in a very short time. You can feed that one rainbow into an AES function and crack any number of zip files in a matter of minutes that way. Having a dynamic round count results in a cracker requiring a unique rainbow table, and increases the time it takes to generate that table - if you do "however many rounds a modern computer can do in 1 second" then each of the n passwords will take 1 second to calculate the hash and make an entry, so it will take n seconds to build that single table, as opposed to tens, hundreds, or even thousands of passwords per second.

SHA-1, itself, is also an issue, as that means that no matter how long or short your password is, the actual AES key is 20 bytes. A brute-force system can just iterate through every permutation of those 20 bytes to crack an encryption without touching the PBKDF2-side of things entirely. You can do your own research into how long that'd take a modern supercomputer to run through. Using SHA-256, or better: SHA-512, you can increase that to 32 or 64 bytes, which is exponentially more difficult to iterate through. Not to mention making password collisions much less likely.

Whether or not there are any active security vulnerabilities, the very paradigm of the existing ZIP file encryption is considered unsafe, poorly informed, and archaic by modern security standards. Making the algorithm and round count dynamic resolves all these issues with minimal overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment