VitaSmith/gust_enc.md

## gust_enc.md

      
    Raw
  

              gust_enc.md
            
          
    A primer on Gust's .e encoding

Hereafter are being discussed the scrambling and compression algorithm used by Gust for the encoding/decoding of their .e files.
This basically explains what gust_enc from gust_tools does and why the utility relies on reading a set of scrambling seeds from gust_enc.json, that are unique to each games.
For additional details, you are strongly invited to look at the gust_enc.c source as well as gust_enc.json.
The scramblers

As one can easily guess from looking both at the file size and the data, a .e file is basically a compressed version of the non encoded file, on top of which scrambling is applied.
Now, after working my way through the whole process, it does look like Gust were a bit paranoid about someone being able to figure out how to descramble a file by simply looking at it, because they are scrambling the thing 4 times in all, through 3 different methods.
The base scrambling formula


Each of the scrambling algorithms derive their "randomness" from the following formula:
seed1 = seed0 * seed1 + SEED_INCREMENT
where seed0 is usually the large prime number 999,977,929 (or 0x3b9a73c9 in hex -- I guess whoever devised the algorithm only had a list of prime number going to one billion) and SEED_INCREMENT is 12041 (or 0x2f09 in hex) which is another prime.
Then seed1 is usually initialized with a different 16-bit prime number for each scrambler, with the formula called over and over again, to get as many semi-random seed1 values as necessary. This unlimited set of seed1 values can then be used as the base elements from which actual scrambling is enacted.
Now, let's talk about each individual scrambler, which I named after their principal mode of operation:
The bit scrambler

This one is being applied both to the start and to the end of the payload, most likely in an effort to make it more difficult to "guess" information from data that is fairly static in the first place.
The way it acts is as follows. Within a specific "slice" size, the seed1 values generated above are going to be used to echange semi-random pairs of bits (not bytes). To do so, the algorithm first allocates a table listing all the individual bit positions of the slice, which I am going to call "index table". This means that, if the slice is say 0x100 bytes in length, then you'll start by building a table of 8 x 0x100 elements that lists each individual number from 0x000 to 0x7ff.
Then, if you compute modulo of seed[1] against the current size of the table above (the length of this table will decrease as we remove values we used from it), you get to a bit index within the slice that has not yet been picked, which you can then copy into another table that I'll call "scrambling table", also of size 0x800, which is the one that will ultimately serve as the basis for the bit swap.
Of course, every time you pick a value at the position indicated by the modulo, that value should not be used again, so it needs to be removed from the first table, which is easily accomplished by shifting the rest of the table one element (through memmove()). For instance, let's say you are working with the ridiculously small slice of 1 byte. The first thing you do then is start by building a table of bit positions with values [0, 1, 2, 3, 4, 5, 6, 7]. And if, for instance, the result of the first computation of seed1 % 8 is 2, then you're going to get the element at position 2 (which happens to also be the value 2 at this stage, but may not always be the same value as the inded as we progress) and move it into the scrambling table. This means that your index table becomes [0, 1, 3, 4, 5, 6, 7] (it is now only 7 values long, since it has lost one element) and the scrambling table is [2] (only one value for now). Then you call on the formula again, get a new result for seed1 % 7, and, if the modulo you get is say 5, you end up with the index and scrambling tables: [0, 1, 3, 4, 5, 7] and [2, 6]. From there, you continue generating new seed1 values and calculating seed1 % 6, seed1 % 5, etc, until you have removed all the elements from the index table and built a full scrambling table. Do that and you'll get a fairly random looking list of bit positions in your scrambling table.
From there, simply go through the scrambling table two by two, to get two index values that you can use to exchange two semi-random bits within the slice, and repeat the whole process (scrambling table generation and bit swap) for as many slices as you need. Eventually then, you'll have a chunk of data where every single individual bit position has been swapped with another.
Provided that the formula you used to swap bit is random-looking enough (it is here), that sure oughta confuse anybody trying to look at your data...

The rotating scrambler

As opposed to the above, this one is a byte scrambler that is applied to the whole payload.
For each new computation of seed1, it basically XOR's each byte of the payload with (seed1 >> 16) & 0xff. However, since this mode of operation is fairly basic and in order to to make things a bit more difficult to figure out, it also alternates between 3 seeds for seed1. In other words seed1 is really seed1[3] and once a certain amount of bytes have been XOR'ed with seed1[i] (this amount is different for each seed), then the scrambler switches to using seed[(i+1)%3].
That is the reason why, if you look at gust_enc.json, you'll see that, besides the main seeds, we need to provide a table and length section, each with 3 entries, as these are the initial values amd length that are used by the rotating scrambler.
The fenced scrambler

Finally, the last scrambler that is applied to the both the payload and an extra footer (that only contains 3 checksums), is what I call the "fenced" scrambler.
As opposed to the previous one, this one only uses a single initial value for seed1, and it also scrambles 16-bit words rather than 8-bit bytes.
To do so, it starts by adding x = (seed1 >> 16) & 0x7fff to every single word from the payload (of course with an updated seed1 for each step) but also, depending on whether the modulo of x with fence * 2 is higher or lower than fence, it may also XOR the payload word with x.
So, depending on whether you are inside or outside the fence, and in yet another semi-random manner, you basically get 2 different scrambling methods being applied.
Putting it together

Now that we have data on the scramblers, we can document the whole process that takes a compressed payload (more about compression below) and generates its corresponding .e scrambled file:

The bit scrambler is called for the first 0x800 bytes of the payload (or the full size of the payload if smaller than 0x800 bytes) using a "slice" of 0x80 bytes (or 1024 bit positions, that will be swapped for each slice). In our JSON file, this consumes one of the "seeds.main" value to set the initial seed1.
A checksum is computed. Against what, and using which algorithm, I don't know, because it isn't validated by the game at all during decoding, so the disassembly doesn't provider any clues. Therefore, because it doesn't really matter, I just used an Adler32 of the unscrambled data. Still, I'd die a happy man if someone could shed some light on what Gust really use to generate that value, because, after I have done all this work to understand this whole compression and scrambling process, I'd want to fill the last blanks. I want to know!!!

This checksum value is what is going to be used to derive seed0 (not seed1) for the next scrambler. But again, since that value is just read by the game application when descrambling, we could have harcoded it to 0 if we wanted to, as it can literally be anything. I suppose if Gust ever decide to validate that last checksum, then the disassembly will reveal how it is really computed...
The rotating scrambler is now applied to the payload. This consumes the 3 seeds from seeds.table in our JSON file, as well as the seeds.length table, as well with the checksum value above (where checksum + 0x3b9a73c9 is used as seed0).
The payload size is inflated to align to 16-bytes, then an extra 16-byte footer is added and a 0xff byte marker is also set to indicate the end of the payload.
Two more checksums are computed. These ones are validated during decoding, so we use the same algorithm to compute them, which is a trivial one. Along with the previous checksum, they are written as three 32-bit Big Endian values in the footer (which again, is the reason why the value of that first checksum we computed doesn't matter, as it is just read from the footer on descrambling).
The fenced scrambler is now applied to the whole payload and the extra footer. This uses another of seeds.main along with the seeds.fence value from our JSON seeds file.
The bit scrambler is called once more, but this time against the last 0x800 bytes of payload + footer, and with a slice of 0x100 bytes. This uses the last unused seeds.main value.
Finally, a 16-byte non-scrambled Big Endian header is added, with a signature 0x00000002 and a "working_size" field, to produce the final .e file. I'll talk more about that working_size little bugger once I cover compression, because it sure caused me a lot of trouble...

Just how many seeds do you need?


If you count properly, you see that, for each game, we need to know 10 values:

3 main seeds (2 for the bit scramblers and 1 for the fenced scrambler)
3 table seeds (for the rotating scrambler)
3 lengths (that tell the rotating scrambler how many bytes it should process for each seed before switching to the next)
1 fence value (for the fenced scrambler)

Apart from the lengths, which appear to be the roughly the same for most of the executables I've seen, all of the other values are unique for each game, and, even if we do know that all of these seeds are 16-bit primes, because we're applying scrambler upon scrambler, trying to detect the seeds automatically, so that we don't have to provide them, would be a bit tricky. Therefore, for the time being, you can only encode/decode .e files for a games where the seeds have been identified.
Also, whereas I could probably have automated the game selection for decoding (try seeds for game N, and if the data doesn't look right try seeds for game N+1), considering that most people should be working on only a single game at omce, this didn't look like a good use of my time. Therefore gust_enc requires the user to manually specify the game they want to encode/decode for, either through the -GAME_ID option, or by editing seeds_id in the .json.
The compression

Figuring out the scramblers is fine and all, but that's only half of the problem. The other half is the little matter of compression.
Now, I have searched, and searched, and searched some more (By the way, if you're interested in compression, I can only warmly recommend Data Compression: The Complete Reference Book, which is freely available in PDF form from locations such as this one), but I haven't been able to figure out a public compression algorithm, that Gust appears to have derived the compression they used in their .e files from.
Whereas the well known deflate compression algorithm is what's being used for their compressed .elixir.gz files (hence the .gz extension), the one they picked for .e files doesn't seem to be based on anything public that I could find.

Enter Glaze

So here again, I've had to reinvent the wheel, starting with naming the damn thing, which I will henceforth call Glaze, for "Gust Lempel–Ziv" compression.
Once more, I'd die a happy man if someone would put an alternate public name to this thing (or, more importantly, point to a public source implementing it), but with the limited data I have, it does look like nothing anybody else seems to be publicly using.
All I can say is that it's an LZ77/LZ78 dictionary based algorithm and possibly an LZSS or LZFG derivative that might have its origins in the cornucopia of Big Endian based LZ algorithms, such as LHA or LZH, that seems to have hit Japan in the late 80's and early 90's. But then again, it looks to me like this specific algorithm could also give LZ4 a run for its money, as it seems to be designed for decompression speed and it compresses about the same ratio (which made me believe for a time that this could be LZ4 in Big Endian disguise, though of course it makes little sense to aim for speed when all you're going to do is decompress a bunch a relatively small XML files).
At any rate, this is not an overtly complex method, but it sure has features that I'm not seeing any other public compression algorithm out there, such as an independent length table, or even a header field to tell decompressors how much memory they should allocate to construct a bytecode table out of the bitstream.
The complete format of a Glaze file is as follows (the values provided as an example come from Sophie's GrowData.xml since that's the first .e file that's being loaded by this game). All the sizes are expressed in bytes and stored in Big Endian format.


decompressed_size
bistream_size
bytecode_size
<...bitstream...>
dictionary_size
<...dictionary...>
length_table_size
<...length_table...>


0001e385
000022a7
00003ecc
3a ... af
00001bdf
ef ... 74
00000001
07


Decompressing a Glaze file

Decompressiong a Glaze file is fairly straightforward: Allocate a bytecode table of bytecode_size convert the bitstream data to bytecodes (the core bytecodes are 3 bit, so they go from 0 to 7), and then process each bytecode according to its value to perform the relevant operation, such as copying a length of bytes from the dictionary or duplicating an already existing sequence of bytes from the decompression buffer, using a specified length and distance.
If you need more details, please take a look at the unglaze() function call from the gust_enc source.
"Compressing" a Glaze file


With regards to compressing, which we obviously need to recreate .e files, ideally we'd want to implement a full blown compression, with Huffman and/or a sliding window and/or whatever this compression algorithm uses.
But the thing is, we only care about recreating .e files for modding, so we don't actually care about reducing their size. So instead, since we have a bytecode (0x07) that translates to "read the next value (byte) from the length table, and use that value to copy  + 14 bytes, verbatim, from the dictionary", then we can take a huge shortcut in our "compression" implementation by simply copying the whole source data, as is, in the dictionary section, and then add as many code 0x07 as we need in the bitsream, as well as the same number of lengths into the length table, so that we'll be able to tell the decompression algorithm that it should just use what's inside the dictionary as the decompressed data. This'll spare us implementing actual Lempel–Ziv so that's just what we're going to do when encoding a file.
Confuse-a-hacker Limited


Now (I told you I'd talk about that working_size field again) when you've made it this far as a hacker, you may reach this point, and start patting yourself on the back for yet another a job well done... only to find out that the XML files you reencode happen to make the game crash most of the time. Whaaaa? But I did everything right! How can that be???
So here's what happened: You looked at a bunch of .e files, and deduced that the second field there was the size of the uncompressed file (which also made a lot of sense: the executable game might want to know how large a buffer it should allocate for the uncompressed file before it starts decoding and decompressing it).
Except, when you look at the actual disassembly of the game, you find a slightly different story. Namely, whereas that value is indeed used to allocate a buffer that is used as a working buffer for decompression, it so happens that, once the descrambling and decompression is almost complete, that buffer is memset with zero for as much of bistream_size + dictionary_size + length_table_size + length of all the various size fields + bytecode_size.
That last one is quite important, because, if you happen to actually compress files, and use the size of the decompressed data as your working size, then the amount of bytes that are going to be zeroed above is rarely going be larger than the size of your working buffer, because bistream_size + dictionary_size + length_table_size + length of all the various size fields is about the size of your compressed data, and that leaves you more than enough margin to also add bytecode_size bytes to your working buffer.
However, if, as we are doing, you aren't compressing the file at all, and blindly use the size of the uncompressed data (or the size of your slighlty larger "compressed" glaze data) as working_size, then the zeroing above, that is carried out without checking boundaries, is going to overflow your allocated working buffer by bytecode_size, since the bytecode table is not something that's present in the compressed file, but is something that gets allocated separately, according to the bytecode_size field (which is the only bytecode data present in there).
In other words, you can think of the working buffer as something that should have enough space to hold both the allocated bytecode table plus the compressed version of the file or the full uncompressed version of the file, whichever of these two values is largest. So, of course, when you're using a shortcut that results in your "compressed" file always being larger than its uncompressed counterpart, yet you still set working_size as the the size of the uncompressed file, the executable starts to zero data above your buffer when it's done decompressing the data, and all kind of bad things happen...

Parting words

Well, that last little bugger (or rather, literal bug) sure kept me banging my head against the wall for a few hours, so thank you Gust for adding a completely unnecessary memset() call, that would turn to only ever be an annoyance for people crazy enough to hack your encoding and compression scheme...
But apart from that, I must say that this was an awesome detective game, even if there are still pieces (the official name of that LZ compression algorithm or how exactly that third checksum is computed) that are still missing.
So even if your games are a bit on the pricey side, Gust, I certainly don't regret a single cent I spent on them!
Addon (2020.08.05)

As much as I'd like to take credit to be the first one to come up with a decompressor (which I figured out independently), it appears that @pleonex did the same about a year before I did.
Darn it! I wish I had found that before I invested my time reverse engineering this whole thing. And a thousand curses to Google for not indexing GitHub!!! I mean, why the heck doesn't Google return any relevant hit for 0x3b9a73c9, when it's right there in the SiA source?