FFlate is the fastest, smallest, and most effective JavaScript compressor/decompressor currently; to create it, I used a variety of modified or optimized algorithms.
The core of most popular compressed file formats is DEFLATE, or RFC1951. GZIP data, Zlib data, .zip
files, PNG images, and more all use some variant of DEFLATE under the hood. At its core, the DEFLATE format is actually not too complex: it's merely a combination of Huffman coding and LZ77 compression.
If you don't understand either of these concepts, feel free to read the following subsections, or skip to Part 2 if you already know what these are and just want to get to implementation details.
Computers think of basically everything as numbers. The smallest unit of information in a computer is a single bit, which can only be either a 0 or a 1. When multiple bits are stringed together, they can be interpreted as a base-2 (binary) number. Although one bit can only represent two numbers, two bits can represent four numbers: 00 means 0, 01 means 1, 10 means 2, 11 means 3. The number of values a certain number of bits can describe is 2# of bits. Programmers long ago arbitrarily decided that instead of interacting directly with bits, they would try to instead deal with bytes, which is 8 bits. Since 28 is 256, you can represent 256 values (any number from 0 to 255) in a byte. The size of a file is determined by how many bytes are needed to contain it. (or kilobytes, or megabytes, or gigabytes)
Let's shift to a different topic. Imagine you want to find a way to more efficiently represent a normal text file. By normal, I mean no emojis, no Unicode, just standard letters, numbers, punctuation, and basic symbols. These "normal" text files only use characters that are classified as printable ASCII. ASCII, or the American Standard Code for Information Interchange, gives numbers from 0 to 127 that map to common characters. Out of it, only 95 are human-readable, or printable. However, ASCII also demands that every character be assigned to a full byte.
You may already see the inefficiency. 8 bits can represent up to 256 values, but we're only using 95 ASCII values per 8 bits. How do we solve this? Simple, we just use 7 bits per character! This can represent 128 values, which is still more than enough to represent the 95 possible values in printable ASCII. Instead of taking 8-bit sequences when we want to read the file, we'll read in 7-bit blocks and do the same for encoding. Now in 7 bytes, or 56 bits, we can represent 8 characters instead of 7 characters, which means that on average, we'll be decreasing the size of the file by over 14%.
Now let's try to do better. This part can be tricky, so bear with me. What if we allowed different characters to need a different number of bits? For example, using 6 bits to represent a common character like the " " (space)? Since the space key is used so frequently, needing one less bit to represent it could actually dramatically reduce the file size. If we continue to make sure the values for each character are different, we should be able to! But then we run into a problem:
Raw bits | Value |
---|---|
000000 | " " |
0000001 | "a" |
1000000 | "b" |
Suppose that when we're reading the file, we see 0000001000000 (that's 6 zero bits, a one bit, and 6 more zero bits). With our current system, even though all of our values are different, we can't tell if this is a 0000001 000000 ("a ") or a 000000 1000000 (" b"). This is where the real magic of Huffman coding comes into play. Huffman codes guarantee that this situation will never occur because the bits used to represent one of these characters will never be the start of the bits representing another character. That is, having 000000 and 0000001 as two separate characters is not allowed in Huffman coding, because 000000 is the same as the first six bits of 0000001.
In other words, we can look at each bit and go down a path (in computer science known as a tree) to find the value. Imagine that going down the left side (or branch) of the following "tree" means that you encountered a one, but going down the right means that you encountered a zero.
You can see that there whenever you reach the end of a path (a "leaf" of the tree), you can be sure exactly what character was encoded; it is never possible for the above issue to occur because characters are only ever at "leaves". This is known as a Huffman tree. Note that an inherent consequence of this is that some characters will take more bits than they would previously have needed, but if we make the most common characters use fewer bits and the least common ones take more bits, we can potentially reduce file size by another couple of percent.
So now wrapping this all together, we can represent the most common characters in our text file, like space, with fewer bits than less common characters using a tree that helps us decide, as we read each bit, what character was encoded.
Last thing, it's important to realize that Huffman codes can apply to more than text. Implementation-wise, we create a mapping of Huffman codes to the 8-bit (byte) values they represent instead of ASCII letters. This means that if in an image, there are many bright colors, the Huffman codes will make the Huffman code representing the byte-value 255 small (e.g. 5 bits), and the Huffman codes of byte values for darker colors (e.g. 0) long (e.g. 10 bits).
Let's revisit our printable ASCII text file. If it's English text, there are probably a bunch of duplicate words, or even duplicate phrases. What if instead of encoding the phrase each time, we just referenced the previous time we had that phrase?
Look at that previous paragraph. We repeated long words multiple times: "text", "duplicate", "phrase". So instead of writing the words out in full, we can replace the repetitions with references to the actual word. We do this by writing out how many bytes before we refer to and how long the reference is. So, here's the above text with LZ77 encoding:
Let's revisit our printable ASCII text file. If it's English (27,4), there are probably a bunch of duplicate words, or even (25,9) phrases. What if instead of encoding the (41,5) each time, we just referenced the previous time we had that (67,5)?
Each parentheses is represented as (# of bytes to go back, # of bytes to copy). For example, the (27,4) means to copy 4 bytes from 27 bytes ago. In reality, we won't actually use parentheses and numbers to represent the references; instead, we will write the numbers as raw bits. This means that the reference is usually around 3 bytes. This is why we don't encode the multiple occurrences of "we": being 2 bytes, a reference would be larger than just writing out the bytes in full.
LZ77 is responsible for the majority of the compression that occurs in DEFLATE; indeed, it would likely be almost as efficient as DEFLATE if the DEFLATE authors hadn't invented a brilliant way to integrate LZ77 and Huffman coding with each other rather than simply use both individually.
The real brilliance of DEFLATE is the way it makes LZ77 references more efficient. A traditional implementation of LZ77 might use 9 bits to represent a standard byte and 17 bits to represent a reference. The decoder reads one bit; if it sees a zero, it takes the next 8 bits as a literal byte (non references). If it sees a one, it reads the next 12 bytes as the number of bytes to go back and the 4 bytes after as the length to copy.
As you may have noticed, Huffman codes can represent an arbitrary number of values (which I will now refer to as symbols). DEFLATE requires that Huffman codes represent any value from 0 to 288. Values from 0-255 are raw bytes (literals), and values from 257 to 288 map to lengths to copy (one part of LZ77 encoding). If a length is encountered rather than a literal, the decoder uses a different Huffman tree to determine the distance (the other part of LZ77).
Although the above may seem overly complex, the actual algorithm is even more complicated; it involves "extra bits" that follow each Huffman code that are to be treated as raw numbers, not Huffman codes, and are to be added to the original length/distance the preceding Huffman code represented.
This can be extremely confusing, but for the sake of understanding, you just need to know that Huffman coding and LZ77 only reduce file size by small amounts; if you encoded a file with Huffman codes, then LZ77 separately, the compression would be much worse than DEFLATE. The primary reason DEFLATE works is that it encodes LZ77 references with Huffman codes.
To understand the rest of this document, I would highly recommend Googling information on Huffman coding and LZ77 (and how it differs from LZSS), then reading the following documents:
RFC1951 (IETF) - focused on theory
RFC1951 (W3) - focused on implementation
UZIP.js - optional - hard to read, but the base of FFLATE
I couldn't have created fflate
without the efforts of previous library authors. Here are the most influential ones.
The de facto library for DEFLATE-based compression and decompression is Zlib. Written in C in 1995, it is used in an incredible number of programs today, including (but certainly not limited to) the Linux kernel, Git's version control system, OpenSSL, the Apache HTTP server, and Chromium. Although efficient methods of decompression could be inferred from reading the RFC alone, fast compression is much harder. Zlib offered an actual implementation for achieving the best (or at least close to the best) possible DEFLATE compression in a short amount of time.
The two new ideas Zlib offered were the concepts of compression level and a sliding window of previous 2-byte sequences. The compression level mainly determines the maximum number of LZ77 matches (i.e. the number of previous 3-byte sequences that match the nezt three bytes) that the compressor will check before deciding upon the largest match. For example, take the following input:
ham hamburger ham ha hamburg hamb hamburger
If the level is low, the compressor will only check a few previous matches of "ham" before giving up and choosing the longest match, but higher compression levels allow more checks. In the above example, a lower compression level might only checking the last 3 matches means the compressor will only find "hamburg" when matching "hamburger", but increasing the compression level might up the number of checks to 4, allowing the compressor to find "hamburger", which is a longer match.
The sliding window esentially describes how we actually find the previous matches. There is one array called head
that maps a 24 bit number (i.e. the next 3 bytes to compress) to the previous byte index in the input stream that preceded the same 24 bit number. In the hamburger example, the index 7168360 (which is the numerical representation of ASCII "ham") in the head
array will contain the value 29, which is the most recent index that contained "ham". The second array, prev
, maps each index mod 32768 to the immediately preceding index that had the same value as it. (Mod by 32768 is necessary to limit the size of this array and is actually OK because DEFLATE does not allow references that go back further than 32768 bytes).
What this means is that it's possible to check every valid preceding index that had the same 3-byte start as the current index. The previous index is head[3-byte value]
, the index before that is prev[head[3-byte value]]
, the index before that is prev[prev[head[3-byte value]]]
, etc. Once the index checked is greater than 32768 lower than the current index or the maximum number of previous matches to check (defined by the compression level) has been exceeded, the loop breaks and references the longest match found.
Pako is identical to Zlib but is rewritten in JavaScript almost line for line. It introduces nothing new but is a reasonably fast library and certainly an excellent starting point.
Of note is that unlike most of its competitors, Pako supports streaming. However, if you're willing to sacrafice the extra bundle time and slower performance for streaming support and you can afford to support fewer browsers, using a WASM port of zlib
is probably the best option.
UZIP.js was the only truly high-peformance JavaScript (de)compression library before FFlate. Its logic and general code structure were the primary motivation for FFlate. It includes three major innovations that were fundamental to FFlate.
First, it introduced the idea of reverse bit-mapping for more efficient encoding and decoding. Typically, to read a Huffman code, one reads bits individually and continually checks if a leaf of the tree has been reached. Since the maximum number of bits in the Huffman codes for DEFLATE is predefined (e.g. 15 bit maximum for a length), UZIP.js creates a mapping of all possible Huffman codes followed by every possible bit suffix (which will be ignored) to the actual code the bits read encode and how many bits were in the original code. Effectively, this reduces the time complexity of one of the most common operations, reading/writing a Huffman code, from linear time to constant time (O(n) to O(1)), which grants a dramatic improvement in performance.
Second, it created an optimal Huffman tree without using a heap with a one-time sort and a greedy algorithm that found the best node to add next. If you don't know what this means, the important thing is that it is much less code to write and slightly faster than alternate methods. It doesn't save that much time, but it saves quite a bit of bundle size.
Most importantly, it used an extremely efficient LZ77 match algorithm. Matching is by far the most heavyweight operation; it alone requires over half the time of running the compressor. Therefore UZIP.js used some optimizations. It stored a 16 bit "hash" of a 24-bit prefix so it could have only a 64K prev
table and still support 3-byte prefixes. It also validated that the hash of the current three-byte prefix was also used in the immediate prev
before entering search loop (the difference is probably only a few CPU instructions each time, but it adds up over billions of iterations). Lastly, every time it finds a match, it looks for the rarest three-byte subsequence within that match (a.k.a. prev[subsequenceIndex]
is furthest back).
Together, these optimizations made UZIP.js undeniably better than Pako for non-streaming compression and decompression.
So what did I innovate in FFlate?
Pako uses a C-style state container class, while UZIP.js stores state in global variables. Neither of these are optimal for JavaScript at all. Custom JavaScript classes are unfortunately far less efficient than C-style classes; passing raw variables to functions is typically faster than using custom objects (note that for built-in classes, which are typically completely written in C, this is not the case). The UZIP.js strategy for state is just as inefficient as Pako's, but it's made worse by the fact that because everything is stored in a global UZIP
variable, tree shaking for smaller bundles is impossible.
FFlate stores only static constants in global variables and has the rest as local variables passed between the compression/decompression methods. This allows for fast garbage collection and better performance. Moreover, the use of mostly local variables, along with ES Modules exports, offers the ability to tree-shake, missing from the other libraries.
Since I was writing my code based off of UZIP.js, I took every opportunity to optimize performance and/or bundle size. I saw a variety of unused variables/functions (which couldn't be tree-shaken out because of the CommonJS exports), redundant or overcomplicated code, single-use functions, and unnecessary calculations when a constant would do better. I also saw some performance opportunities, such as using .slice
(copying memory, O(n)) when .subarray
(reuse memory, O(1)) would work as well, or using a JavaScript dynamic array when a typed array would work as well. By shaving or optimizing these issues away and merely translating the code into modern TypeScript, I was able to achieve marginally faster compression and decompression at a much smaller bundle size.
The way I was able to really stand out over UZIP.js for compression specifically was with a new 3-byte hashing function. I allowed users to modify the amount of bits in each hash, doubling the memory needed for the head
each increment but improving performance. I made the default, which most users will use, dynamic based on the size of the input data. Moreover, I used a smarter hashing algorithm that used XOR instead of OR (more unique values) and spaced each byte an equal distance apart (e.g. for 16 bits, 5 bits apart, and for 21 bits, 7 bits apart).
My optimizations make FFlate superior to arguably every other JavaScript compression library. Indeed, it can compete with and even beat the native zlib
module for compression in Node.js (which is written in C) on real-world data for both performance and compression ratio. Try testing it on large files, such as those from this compression testing dataset, to see for yourself. Note that using random bytes, such as from crypto.randomBytes
, will never appear in an actual use case and will make the C bindings appear much faster. In addition, decompression is simply faster in C because fewer optimizations are possible.
A few random ideas for the future:
- Look for more micro-optimizations
- Add variable-length hashing for further optimization (e.g. 5-byte overlap XOR to 3 bytes in same
prev
table - once the compressor finds a certain length, only looks for that plus one) - note that this may make previous hashing method, rarest-prefix finding difficult