Created
July 21, 2012 07:12
-
-
Save anonymous/3154939 to your computer and use it in GitHub Desktop.
Functional programming in C
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
Functional programming in C | |
``````````````````````````` | |
This is an example of how to use executable memory to get partial function | |
application (i.e. bind1st()) in C. (Well, this actually only compiles as C++ | |
since i'm using a varargs typedef, but there's no classes or templates.) | |
To proceed, we need to be comfortable with the cdecl and stdcall calling | |
conventions on x86, and just a little assembly. | |
In cdecl, function arguments are pushed onto the x86 stack from right to left, | |
and then control passes to the callee. The callee returns, and then the caller | |
restores the stack to the way it was before continuing on its way. The stdcall | |
convention is similar but the callee is responsible for restoring the stack | |
pointer to clean up after its own arguments. | |
It's a very minor difference and most of the time it's inconsequential, you | |
wouldn't even notice it. If you declare a function in C, it uses the cdecl | |
convention unless you specify otherwise. This is normal behaviour for most of | |
the linux world, but every Win32 API is declared stdcall. | |
There are some important differences, though: if you have a function with a | |
variable number of arguments - like printf() in the CRT - this would compile | |
to something like | |
; printf( format_str, arg[0] ) | |
push dword [ebp+8] | |
push format_str | |
call _printf | |
add esp, 8 | |
and printf could simply pop an argument off the stack every time it encounters | |
a % character in the format string. It doesn't explicitly count the number of | |
parameters, and therefore has no idea how to clean up after itself, so it | |
must be done as cdecl. There are ways around this - <cstdarg> manages it | |
somehow? - but that's the general idea. | |
In the reverse case, if you're using partially applied functions, then the | |
caller sees a function of a single argument, but the target function takes | |
multiple arguments. Unless we can magically unwind the stack some other way, | |
it seems like the callee is the only function that can safely restore the | |
stack, so we must use stdcall for both our target functions and their | |
partially applied forms. | |
Partial application involves storing an extra parameter and returning a C | |
function pointer. We really have to end up with a native C function that we | |
can call, one that's portable to other functions and can be thrown around | |
without much special attention. My approach here is to create a stub | |
function on the heap that pushes an extra function argument, and then jmp | |
toward the real target. | |
So, to begin, let's figure out executing assembly on the heap: | |
*/ | |
#include <cstdio> | |
#include <Windows.h> | |
void* xalloc(size_t len) { | |
void* ret = VirtualAlloc( | |
NULL, len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE | |
); | |
if (! ret) { | |
fprintf(stderr, "Fatal: VirtualAlloc() failed\n"); | |
exit(1); | |
} | |
return ret; | |
} | |
void xfree(void* ptr) { | |
VirtualFree(ptr, 0, MEM_RELEASE); | |
} | |
/* | |
Those are Win32-specific calls, but it should be pretty easy to translate that | |
into linux or any other platform using mmap with the flags parameter set to | |
PROT_READ|PROT_WRITE|PROT_EXECUTE. | |
*/ | |
typedef void* __stdcall (*heapfn)(...); | |
heapfn make_heapfn(const char* asmcode, size_t len) { | |
void* ret = xalloc(len); | |
memcpy(ret, asmcode, len); | |
return (heapfn)ret; | |
} | |
/* | |
This function takes a string of (i'm assuming i386 here) opcodes and copies | |
them into executable memory on the heap. We have to pass the length separately | |
since C strings are null-terminated, but our assembly could contain a zero. | |
That's all we need, so let's try a simple function to make sure everything is | |
OK. In both cdecl and stdcall, the return value from a function is stored in | |
the eax register: | |
55 push ebp ; enter function | |
89E5 mov ebp, esp | |
B801000000 mov eax, 0x01 ; return 1 | |
C9 leave | |
C3 ret | |
*/ | |
void test_return() { | |
heapfn fncall = make_heapfn( | |
"\x55\x89\xE5\xB8\x01\x00\x00\x00\xC9\xC3", 10 | |
); | |
printf("Returns: %d\n", (int)fncall()); | |
xfree((void*)fncall); | |
} | |
/* | |
This prints "Returns: 1". | |
We can now implement our stub for partial application, but there's one thing | |
to watch out for - chaining multiple stubs together. The address to return to | |
is pushed onto the stack with `call' and popped with `ret', so we must be | |
careful in order for these to chain along correctly: | |
5B pop ebx | |
68xxxxxxxx push dword param1 | |
53 push ebx | |
68xxxxxxxx push dword fn_stdcall | |
C3 ret | |
Here, we take the return address off the stack, push our extra parameter after | |
the others, and put the correct return address on the stack again (this could | |
be done without clobbering the ebx register). The `jmp' mnemonic is mostly | |
used for relative jumps on x86, so the easiest way to go directly to a pointer | |
is, strangely enough, to push the target onto the stack and then return (or | |
if you're resigned to losing a register, mov eax ptr; jmp eax). | |
*/ | |
#define B1ST_BSET(s) \ | |
(char)((s)&0xFF),(char)(((s)>>8)&0xFF), \ | |
(char)(((s)>>16)&0xFF),(char)(((s)>>24)&0xFF) | |
heapfn bind1st(void* fn_stdcall, void* param1) { | |
char asmcode[13] = { | |
'\x5B', '\x68', B1ST_BSET((int)param1), | |
'\x53', '\x68', B1ST_BSET((int)fn_stdcall), '\xC3' | |
}; | |
return make_heapfn(asmcode, 13); | |
} | |
/* | |
And that's all there is to it. We can define some macros for convenience to | |
stop us adding typecasts everywhere, but if you want type safety this is a | |
ridiculous technique to use (in C++, there's a bind1st for std::function<> | |
objects in <functional> for when a C++11 lambda is inappropriate). | |
Here are those macros, and some sample target functions for binding - remember | |
the stdcall declaration! Mistakenly using a stdcall to call a cdecl function | |
only leaves garbage in the stack, but using a cdecl to call a stdcall function | |
removes the arguments twice from the stack and you will crash on return. | |
*/ | |
#define BIND(type, target, param) \ | |
(type __stdcall *(*)(...)) bind1st( (void*)target, (void*) param) | |
#define MKBIND(type, varname, target, param) \ | |
type __stdcall *(* varname)(...) = BIND(type, target, param) | |
int __stdcall doubleit(int x) { | |
return x*2; | |
} | |
int __stdcall add(int x, int y) { | |
return x + y; | |
} | |
int __stdcall add3(int x, int y, int z) { | |
return x + y + z; | |
} | |
void test_bind() { | |
heapfn myfnc = bind1st((void*)doubleit, (void*)2); | |
printf("Bound: %d\n", (int)myfnc()); | |
xfree((void*)myfnc); | |
heapfn myadd = bind1st((void*)add, (void*)5); | |
printf("Bound: %d\n", (int)myadd(1)); | |
xfree((void*)myadd); | |
// Alternatively, using our macros: | |
MKBIND(int, myx, add, 5); | |
printf("Bound: %d\n", myx(1)); | |
xfree((void*)myx); | |
// Chaining works as expected, although i'm leaking a function here | |
MKBIND(int, recur, BIND(int, add3, 3), 5); | |
printf("3+5+7 = %d\n", recur(7)); | |
} | |
int main(int, char**) { | |
test_return(); | |
test_bind(); | |
return 0; | |
} | |
/* | |
That's all! Partial function application in C. For C++. For 32-bit Windows. | |
Built with nasm, ndisasm and MinGW GCC 4.6.2. Please don't ever use this. | |
- Mason <mappu zero four at gmail.com>, July 2012 | |
*/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment