Skip to content

Instantly share code, notes, and snippets.

@RealityRipple
Created January 19, 2020 08:39
Embed
What would you like to do?

Additions to AES Encryption in Zip Files

I. Foreward

AES Encryption is considered a safe encryption algorithm. However, the implementation of AES as described by the Zip file standard (see https://www.winzip.com/win/en/aes_info.html) is sorely lacking in implementation.
The issues with the implementation are not in the execution of AES-CTR itself, but with the PBKDF2 constants used by all existing compatible archiving software. Below is my attempt to improve upon the existing Zip-AES standard as best as possible without interfering with the parent Zip standard or with software that reads or writes Zip files.

II. New AES Encrypted file storage format

A. File Format

All fields are identical to the existing file storage format, and the two new fields follow the existing (little-endian) standard.

Size (bytes) Content
Variable Salt value
2 Password verification value
Variable Encrypted file data
1 PBKDF2 Hash Type
8 PBKDF2 Rounds
10 Authentication code
B. PBKDF2 Hash Type

The Hash Type value is a single byte that can be any number between 0 and 255... eventually. For now, only the following types are defined:

  1. HMAC-SHA-1
  2. HMAC-SHA-256
  3. HMAC-SHA-384
  4. HMAC-SHA-512

The Hash Type should only be used for PBKDF2; the Authentication code should remain as HMAC-SHA-1 in all cases, as it is a matter of fidelity, not security. This also allows existing software the possibility of being able to validate the Authentication code, if desired (see Section III, part C: "Compatibility").

This value is stored unencrypted.

C. PBKDF2 Rounds

This value, quite simply, is the number of rounds to be used in PBKDF2, expressed as a little-endian 64-bit unsigned integer. It is recommended that this value be greater than 4,000 at the very least, if not closer to 100,000 (as of early 2020). Since this value sticks to the file, not the program, a decent suggestion would be to shoot for as many rounds as you can within a user-friendly timeframe.
For example, you could try generating a PBKDF2 key using n rounds, then divide n by the time it took to calculate the key (in milliseconds) and multiply that by 1000. The result would be a decent average number of rounds per second (per encrypted file, not per Zip file) that would scale upward with faster hardware without requiring code changes. If a full second of the application being locked is too long for your purposes, you can multiply n/time by whatever number of milliseconds you find appealing, of course.

This value is stored unencrypted.

III. Notes

A. Rationale

The two elements included in this modification to the standard are the HMAC Hash Type and iteration count for PBKDF2. By default, these values are, of course, SHA-1 and 1000, respectively. However, the biggest security flaw in this is that these values are constants. Not because they should be unknown, but because they should be modifiable to keep up with the changes in hardware capability. It may be advisable in the future to even include a variable for the key generation type rather than only allowing PBKDF2,as it seems likely that other possible standards using similar inputs will become available as technology advances further.
In this decade, however, PBKDF2 is still the best option, if it gets a few little improvements. Those improvements become possible through the PBKDF2 Hash Type and Rounds values added above in 9 bytes. Note that while unlikely, a full 64-bit value has been provided for the number of rounds. I have no clue how long PBKDF2 will last, and even if I did, guessing at the number of iterations considered "safe" at any given time would be a fool's errand, especially if new hashing algorithms are implemented. I've also provided 251 free spaces for those potential other Hash Types to use. If the time comes that all of those are used, then someone really got a little too liberal with which hash types to include.

B. Extra Space Usage

These additions will only increase a ZIP file by 9 bytes per included file, which is some decently small overhead, given the amount of added security. Additionally, the forward-compatible nature of the changes means that any future changes will take no extra storage space beyond these 9 bytes per file. Additionally, the number works very well with the possible Salt lengths, as any leftover size in the content beyond the known sizes must be greater than 16 even in the case of AES-128.

C. Compatibility

Storing these extra elements in the "compressed data" AES header means that the changes will not break or inhibit existing archive tools. While tools that use the standards will not be able to decrypt files encrypted with this format, they will be able to add their own files to archives that do include these modifications without getting in the way. Additionally, no changes to the general flags or compression type value means that the files will be detected as AES-encrypted just like a normal AES-encrypted file would within the archiver. Hopefully, the additional placement of the optional data between the file data and the final 10 byte MAC means that a parser could even gloss over the extra 9 bytes if they happen to look for "the final 10 bytes" instead of "the next 10 bytes", and even verify the MAC.

D. Parting Words

If you would like to implement these changes, consider them public domain. No credit or license of any type is attached here.
If you wish this would become standardized, join the club. Redbubble can make us all T-shirts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment