Preface
I recently ran into a super old-school buffer overflow while fuzzing the
Yz1
archive (de)compression library [0].
The intention with this write-up is to go from a crash to code execution
in one of the file archival software that bundles the Yz1
library –
namely, IZArc
[2]. The target platform is Windows 10 64bit (although
both IZArc
and Yz1
are 32bit-only).
The analysis is made with Ghidra [6] coupled with the PHAROS OOAnalyzer
plugin [7]. My (sometimes horribly misleading) renaming of functions,
variables and structure/class members tend to be prefixed with x_
.
The image base for Yz1.dll
in our analysis is 0x10000000
and the
version is 0.30 (as is shipped with IZArc
, but newer versions of Yz1
are also vulnerable).
Introduction
Yz1
is an archaic compression format developed by YAMAZAKI at Binary
Technology. It was part of their DeepFreezer archiver software. [1]
Both of these components are closed-source and proprietary. However,
Yz1
is distributed as a shareware binary-only DLL and it’s bundled
with a few modern file-archivers – see e.g. IZArc
[2], ZipGenius
[3] and Explzh
[9].
The interface for Yz1
is somewhat interesting. There are a few
standalone functions that tries to verify that an archive is valid.
There are also functions for retrieving filenames and their metadata and
in an archive [4]. However, in order to compress or decompress files,
there is a single (public) function named Yz1
[4]:
int WINAPI Yz1(const HWND wnd, LPCSTR cmd, LPSTR buf, const DWORD siz);
Every argument other than cmd
can be NULL
or 0
for window-less use
where no feedback is to be received from the module itself.
The cmd
argument specifies what operation should be performed and with
what options [5]. Some of these arguments include:
c - Create archive
x - Expand archive
-cN - Check timestamp according to N
-iN - Silnce status output according to N
This makes working with the Yz1
API kind of like working with the
command-line interfaces of traditional (un)archivers like tar
or
zip
.
As mentioned, the focus in this write-up is on IZArc
[2] and its use
of the Yz1
library. The functionality in Yz1
that we’ll pay
attention to is the functionality that’s used by IZArc
.
The Yz1 header
The (for this write up) first relevant entrypoint, before the Yz1()
function is reached, is Yz1CheckArchive()
. IZArc
uses this function
to validate Yz1
archives before processing them.
The prototype looks like this [4]:
BOOL WINAPI Yz1CheckArchive(LPCSTR filename, const int mode);
The first argument is the filename of an archive to check. The second
argument is what mode to check. There are a number of checking modes
defined in Yz1.h
[10]:
CHECKARCHIVE_BASIC 1
[...]
CHECKARCHIVE_ALL 16
For any checking mode that isn’t CHECKARCHIVE_ALL
, the function will
return true
or false
depending on whether the archive is valid. A
mode of CHECKARCHIVE_ALL
introduces more return values, despite the
return type.
In any case, since our target program IZArc
seem to always invoke
Yz1CheckArchive
with a mode of CHECKARCHIVE_BASIC
, we can ignore the
other modes.
The Yz1CheckArchive()
function, as well as the generic Yz()
function
(in extraction mode) takes us to a class method with the following
signature:
int __thiscall YzFile_DecodeHeader(yzFileDecode *this, char *x_path);
This method is far too complex to distil in its entirety, but it performs a number of noteworthy operations. First off, it starts by reading a 0x14 byte header from the input file:
/*
* 1000e7b7
*/
x_size = _fread(&x_header,1,0x14,x_yz1File->fp);
For example, with an archive containing the following three files:
>>> from pathlib import Path
>>> for p in Path().glob("*.txt"):
... print(f"{p}: {p.stat().st_size:#x} bytes")
...
aaaa.txt: 0x25 bytes
bbbbbbbbbbbbbbbb.txt: 0x1d bytes
cccccccccccccccccccc.txt: 0x21 bytes
The header will look something like this:
$ hexdump -e '4/1 "%02X" "\n"' demo.yz1
# 0: Archive magic (yz01)
797A3031
# 1: Flags; used to, e.g., indicate whether the archive is password-protected.
30363030
# 2. ???
000000B2
# 3. The chunk of memory required to decode the filenames.
#
# > file_count * sizeof(DWORD) * 2 * len(all_filenames_incl_NUL)
#
# In our example, that is:
#
# > 3*4*2 + len("aaaa.txt\0bbbbbbbbbbbbbbbb.txt\0cccccccccccccccccccc.txt\0")
0000004F
# 4. File count
00000003
The reason for the additional two DWORDs per file for the third field in
the header is for metadata, as can be seen when
yzFileDecode::YzFile_DecodeHeader
allocates and decodes the filenames
into a chunk of that size:
# _malloc(this->x_totalFilenameSize)
0:000> bu YZ1!yzFileDecode::YzFile_DecodeHeader + 0x56b
0:000> g
...
Breakpoint 0 hit
...
0:000> dd esp L1
0070eae8 0000004f
0:000> p
eax=03955330
0:000> bu YZ1!yzFileDecode::YzFile_DecodeHeader + 0x643
0:000> g
...
Breakpoint 1 hit
...
0:000> dd 03955330 L20
03955330 baadf00d baadf00d baadf00d baadf00d
03955340 baadf00d baadf00d baadf00d baadf00d
03955350 baadf00d baadf00d baadf00d baadf00d
03955360 baadf00d baadf00d baadf00d baadf00d
03955370 baadf00d baadf00d baadf00d abeefeee
03955380 abababab feababab 00000000 00000000
03955390 1dfe6d47 2000c430 000b0001 000b0004
039553a0 000b0003 000b000b 000b000b 000b000b
0:000> p
0:000> dc 03955330
03955330 25000000 1d000000 21000000 a26bf15e ...%.......!^.k.
03955340 478ff05e 61616161 61616161 7478742e ^..Gaaaaaaaa.txt
03955350 62626200 62626262 62626262 62626262 .bbbbbbbbbbbbbbb
03955360 78742e62 63630074 63636363 63636363 b.txt.cccccccccc
03955370 63636363 63636363 742e6363 ab007478 cccccccccc.txt..
03955380 abababab feababab 00000000 00000000 ................
03955390 1dfe6d47 2000c430 000b0001 000b0004 Gm..0.. ........
039553a0 000b0003 000b000b 000b000b 000b000b ................
In the last memory display above, we see that the first three DWORDs
correspond with the file sizes in big-endian (0x25, 0x1d, 0x21). After
that are three DWORDs that I’m too lazy to figure out what they mean
(yes, there really are three – notice that the file named aaaa.txt
has 4 0x61). And finally are the NUL-separated filenames.
This chunk of memory is then processed in a method with the following signature:
yzDecHead *__thiscall yzDecHead(yzDecHead *this,
uchar *x_filenames, /* chunk dumped above */
long *x_fileCount, /* 0x3 */
yzFileEv *x_yzFileEv,
long *x_filenameSize, /* 0x4f */
bool *x_success);
The bounds for each filename is retrieved with the following C-ish code:
/*
* 0x1000d2fd
*/
DVar8 = *x_fileCount;
[...]
if ((uint)*x_filenameSize < DVar8 * 0xc) {
[...]
} else {
x_fileCount = (long *)(x_filenames + DVar8 * 8); // adjust for metadata
[...]
uVar9 = 0;
plVar7 = x_fileCount;
while ((plVar7 < x_filenames + *x_filenameSize &&
(*(uchar *)plVar7 != '\0'))) {
plVar7 = (long *)((int)plVar7 + 1);
uVar9 = uVar9 + 1;
}
[...]
With our example archive, *x_fileCount
is 3 and *x_filenameSize
is
0x4f. The reuse of x_fileCount
in the decompilation looks weird, but
x_filenames + DVar8 * 8
adjusts for the initial
x_fileCount * sizeof(DWORD) * 2
of metadata in the x_filenames
buffer.
As can be seen, it doesn’t matter how long any of the filenames are, so
long as a NUL-byte is encountered somewhere in the x_filenames
chunk
(otherwise we’d run into an out-of-bounds read).
Even so, Yz1
operates under the assumption that filenames are limited
to FNAME_MAX32
bytes. From the publically available Yz1.h
[10]:
#if !defined(FNAME_MAX32)
#define FNAME_MAX32 512
#define FNAME_MAX FNAME_MAX32
#else
#if !defined(FNAME_MAX)
#define FNAME_MAX 128
#endif
#endif
After yzFileDecode::YzFile_DecodeHeader
and yzDecHead::yzDecHead
has
decoded and processed the header and filenames, the filenames are stored
with their actual lengths for later use. This information is used when
extracting the archive and/or listing its files with this exported
structure and these functions:
typedef struct {
DWORD dwOriginalSize;
DWORD dwCompressedSize;
DWORD dwCRC;
UINT uFlag;
UINT uOSType;
WORD wRatio;
WORD wDate;
WORD wTime;
char szFileName[FNAME_MAX32 + 1];
char dummy1[3];
char szAttribute[8];
char szMode[8];
} INDIVIDUALINFO, FAR *LPINDIVIDUALINFO;
int Yz1FindFirst(HARC x_harc, LPCSTR x_pattern, LPINDIVIDUALINFO x_dst);
int Yz1FindNext(HARC x_harc, LPINDIVIDUALINFO x_dst);
Both of these functions invoke a method starting at 0x10002de0
that
enforce the FNAME_MAX32
(512/0x200) byte limit (sorry for the lack of
cleanup!):
[...]
/*
* LAB_10002f17
*/
if (*(uint *)(*(int *)(iVar3 + 0x10) + 0x14 + uVar2 * 0x1c) < 0x200) {
iVar3 = x_getPathInstance((cls_10002bc0 *)
(*(int *)(this->mbr_34 + 4) + 0xc),this->mbr_48);
/*
* NOTE: This is not important right now, but it will matter during
* exploitation. Filenames shorter than 0x10 bytes are stored
* inline at iVar3 + 4. Filenames GTE 0x10 are allocated a separate
* buffer whose address is stored at iVar3 + 4.
*/
if (*(uint *)(iVar3 + 0x18) < 0x10) {
x_filenameSrc = (char *)(iVar3 + 4);
}
else {
x_filenameSrc = *(char **)(iVar3 + 4);
}
x_filenameDst = x_dst->szFileName;
do {
x_chr = *x_filenameSrc;
*x_filenameDst = x_chr;
x_filenameSrc = x_filenameSrc + 1;
x_filenameDst = x_filenameDst + 1;
} while (x_chr != '\0');
}
else {
/*
* x_dst->szFileName = "too_long_file_name\0"
*/
*(undefined4 *)x_dst->szFileName = 0x5f6f6f74; /* _oot */
*(undefined4 *)(x_dst->szFileName + 4) = 0x676e6f6c; /* gnol */
*(undefined4 *)(x_dst->szFileName + 8) = 0x6c69665f; /* lif_ */
*(undefined4 *)(x_dst->szFileName + 0xc) = 0x616e5f65; /* an_e */
*(undefined2 *)(x_dst->szFileName + 0x10) = 0x656d; /* em */
x_dst->szFileName[0x12] = '\0';
}
[...]
A stack-based buffer overflow
Not all code paths pay attention to the recorded lengths of the
filenames. The one my fuzzer ran into is a function that starts at
0x10005080
. It sprintf(..., "expanding %s", ...)
with the file
currently being extracted for a logging message.
It’s kind of interesting too, because – similar to the snippet above –
the call to sprintf()
also checks whether the filename is inline (that
is, if its length is below 0x10). But it doesn’t check that the filename
is below FNAME_MAX32
.
/*
* 100055b5
*/
if (this_00->mbr_18 < 0x10) {
pDVar6 = &this_00->mbr_4;
}
else {
pDVar6 = (DWORD *)this_00->mbr_4;
}
_sprintf(&local_264,"expanding %s",pDVar6)
Mo’ bugs mo’ problems
In working to exploit the fuzzed bug in the last section, I ran into a situation where we had written N bytes on the stack before the first [RJC]OP gadget. However, after the first gadget we could only write a handful of subsequent gadgets. Otherwise, we’d run into another bug earlier in the extraction process.
Similar to the previous flaw, this flaw is caused by a stack overflow.
It happens in yzFileDecode::DecodeFile
. Ghidra produces a somewhat
wonky decompilation of this method, so the following C-ish code has been
rewritten for clarity (at the expense of not being an accurate
representation of its disassembly – although the important locations
are commented):
/*
* 1000eec0
*/
int yzFileDecode::DecodeFile(char *param_1, int *param_2)
{
int rc;
int duplicateCount = 0;
unsigned int i = 0;
char buf[XXX];
[...];
/*
* LAB_1000efa0
*/
do {
if (this->x_yzDecHead->filenames == NULL) {
[...];
}
/*
* 1000f170
*/
_sprintf(buf, "%s%s", this->x_dirname,
this->x_yzDecHead->x_filename[4 + i * 0x1c]);
rc = x_hasFile(buf);
if (rc) {
duplicateCount += 1;
}
} while (i < this->x_yzDecHead->x_fileCount); /* 1000f02e */
[...];
/*
* 1000f036
*/
if (duplicateCount > 0) {
x_overwriteWarning();
}
[...];
}
In the snippet above, each filename in the archive is checked for
existence on disk. If it already exist, a warning message may be
presented to the user (Yz1
only shows GUI messages if it’s been given
a HWND
).
As with the previous bug, the call to sprintf()
is unchecked. If a
path in 1000f170
is large enough, we’ll overflow the stack.
However, one major issue with this flaw is that the call to sprintf()
at 1000f170
prepends the extraction directory to the filename. This
complicates exploitation. It also makes it difficult to exploit the
first bug mentioned in this writeup, because whether this bug is
triggered before depends partly on something we can’t control (i.e.
where the user chooses to extract the archive).
With that in mind, one positive aspect of this bug is that we can
overwrite this->x_yzDecHead
and cause an invalid memory access in the
do-while()
conditional. This leads to quick control of execution if we
overwrite a SEH.
Sploitin’ like its the 00s
There are two important aspects of the decoding process of the archive header and its filenames:
- As mentioned above, filenames are separated by their terminating NUL-byte in the initial processing.
- The chunk referenced as
x_filenames
above will contain as much decoded data as is specified by the third DWORD in the archive header (excl. leading metadata).
As will be seen in the PoC, I haven’t bothered reverse engineering and reimplementing the (de|en)coding algorithm (presumably based on Huffman). However, it seems that the archive filenames and their content are adjacent each other such that:
- filename_0 - filename_1 - filename_2 - ... - content_of_filename_0 - content_of_filename_1 - content_of_filename_2 - ...
If we’d modify the third DWORD in the header (0x4F in the demonstrative
archive above) to a larger value, the buffer referenced as x_filenames
would not only contain the decoded filenames, but also (part of,
depending on the value) their decoded contents.
This means that we can use the Yz1
library itself to write our exploit
for the unchecked calls to sprintf()
. The general approach looks like:
- Create an archive with N files.
- Set a breakpoint before the filenames are encoded, but after the metadata has been constructed.
- Remove the terminating NUL-byte for one of the filenames (effectively concatenating them).
- Let the process finish.
- Increase the third DWORD in the header to a size that includes the length of all file content.
The result is that the decoding process will interpret file contents as
filenames. This gives us ample opportunity to create a source buffer
large enough to overflow the stack in the call to sprintf()
.
This can be accomplished with pykd
[8] – which is not only a plugin
for WinDbg
but also very usable as a standalone Python module for
automated debugging.
As for exploit mitigations, the changelog for IZArc
mentions that ASLR
and DEP was introduced in IZArc
version 4.3. However, that only
applies to the main executable and some plugins (presumably the
plugins for which the author has access to the source code).
With that said, only two of the shipped modules are non-rebased:
Tar32.dll
and cabinet5.dll
.
Anyway, after having removed the NUL
between two filenames in the
archive, the file contents that will later be interpreted as a filename
will contain the following:
- Enough data to overflow the stack (incl. SEH).
- A SEH gadget that adjusts
esp
and returns into our ROP sled. - Gadgets that prepares the stack with appropriate arguments for
VirtualAlloc()
- Gadget to invoke
VirtualAlloc()
by using its IAT slot inTar32.dll
. - Our shellcode.
Unfortunately, it’s difficult to write a reliable exploit due to the
extraction directory being prepended to our overflowing “filename”. The
approach taken in the PoC is to spray the SEH overwrite after adjusting
the initial bogus data in an attempt for the overwrite to land on an
appropriate DWORD boundary. The alignment is done in the interval
[0,4)
– i.e. len(path) % 4
(where path
includes the trailing
\
). So, there’s a 1 in 4 shot for success if the extraction path is
unpredictable.
Demonstration
Because the PoC uses Yz1.dll
and pykd
to create the payload, and
because Yz1.dll
is a 32-bit Windows-only module, the payload has to be
created on a Windows system with a 32-bit Python >=3.6.
Example:
> "C:\Program Files (x86)\Python38\python.exe" exploit.py \
--dll "C:\Program Files (x86)\IZArc\Yz1.dll" \
--output C:\Users\user\Downloads\archive.yz1 \
--align C:\Users\user\Downloads\archive
=> created: C:\Users\user\Downloads\archive.yz1
=> extraction path alignment: 0
Note that --align
can also be an integer [0, 4)
or left out
completely (in which case it’s derived from the --output
path).
References
- https://www.madobe.net/archiver/lib/yz1.html
- https://ja.wikipedia.org/wiki/DeepFreezer
- https://www.izarc.org/
- http://zipgenius.com/
- https://gist.github.com/illikainen/6f228c42b77c21c1e2954966b54179fc
- https://gist.github.com/illikainen/b33fbc933246981ce49d8d62aabd43cf
- https://ghidra-sre.org/
- https://github.com/cmu-sei/pharos
- https://githomelab.ru/pykd/pykd
- https://www.ponsoftware.com/en/
- https://gist.github.com/illikainen/16ce066720e58dffd8a80fffe877df14