Skip to content

Instantly share code, notes, and snippets.

@JustSlavic
Last active December 19, 2022 09:17
Show Gist options
  • Save JustSlavic/57e484b1003d4dd595056b4ddac909f6 to your computer and use it in GitHub Desktop.
Save JustSlavic/57e484b1003d4dd595056b4ddac909f6 to your computer and use it in GitHub Desktop.
How to start a C program without CRT on 32bit ARM Linux

Start your C program without CRT on 32bit ARM Linux

Preamble

I always wanted to go deeper into programming, so when my neighbour showed me theirs RaspberryPi I immediately asked them to borrow it. At that time I felt urge to program really bad, because I was left without my computer, and RaspberryPi turned really handy!

I jumped into it right away and started to learn ARM assembly. I wanted to run my program without C runtime library. While searching the Internet how to do that, I found that there are not so many resources where one could learn what such process looks like, so I want to share what I found when did that.

The following article shows step by step how to go from start of the process to main function in C.

ARM assembly

First of all we need to learn how to write and compile assembly code. If you are scared of assembly language, do not. I do not know much about it either, so don't worry.

Let's start with the simplest assembly:

.text
.global _start

_start:
    mov     r0, #0
    mov     r7, #1
    svc     #0

We have to define section .text, it's the place in the executable where our code lives. Then, .global will tell to the assembler, that we will have a function called _start, and we want the linker to find it. Finally we have our function. You could notice, that it is called _start, not main. It's because in Linux the true entry point of any program is _start subroutine.

All we do for now, is just exit the program. I found list of syscalls to linux kernel, where I found exit syscall.

syscall name %r7 arg0 (%r0)
exit 0x01 int error_code

In this table we see, that in order to call exit, we need to place error_code to r0 register, then 0x01 to r7 register, and then do the svc instruction, which will do the syscall.

Let's build this! I use GNU assembler, which was pre-installed on my RaspberryPi:

$ as start.s -o start.o
$ ld start.o -o start
$ chmod +x start
$ ./start
$ echo $?
0

If you try to change error_code (the value, moved to r0 register), return code of the program will change accordingly. I would call it a win!

Hello, World!

Next step would be printing something on the screen. For that we need to call another syscall: write, let's look it up in that table:

syscall name %r7 arg0 (%r0) arg1 (%r1) arg2 (%r2)
write 0x04 unsigned int fd const char *buf size_t count

I hope you already figured out how to invoke that syscall. But we don't know how to define a string and load the address of it into register. With help of the Compiler Explorer and the Internet, we can do it like that:

.text
.global _start

_start:
    mov     r0, #1
    ldr     r1, =string
    mov     r2, #14
    mov     r7, #4
    svc     #0
    
_exit:
    mov     r0, #0
    mov     r7, #1
    svc     #0
    
string:
.ascii "Hello, World!\n"

Here we have new instruction ldr, which in this form load address of our string to the r1 register. In the r2 goes the size of the string which I counted is indeed 14.

Syntax .ascii allows us to define ascii string right in the executable. Notice that I defined it in the .text section of the executable, it's probably not a good idea, but in this simple example it works so I'll leave it like that.

You also could notice, that I do not call _exit in any way, execution just continues to the next instruction as it should, and _exit starts executing right after _start just as I want.

Let's run our code!

$ as start.s -o start.o
$ ld start.o -o start
$ chmod +x start
$ ./start
Hello, World!
$ echo $?
0

Looks like everything is working! I couldn't be more happier!

Linking main.c

How to call a function in ARM

Next, I am going to write and call main function in C. To do that, we have to learn how to call a function.

In ARM architecture we have instructions called b (branch) and bl (branch with link, I guess?). Instruction b is like a jmp, does unconditional jump to the instruction, but interesting thing is that bl stores address of the next instruction in the lr (link register).

It's interesting, because in x86 architecture you do not have access to the pc register, that stores address of the next instruction, but in ARM you do. It allowes you to manipulate which instructions are executed next.

For example, you called a function: bl foo, by doing that you stored return address in the lr, so when you return you just do mov pc lr, or bx lr.

Calling main

Let's try to call main. Here it is:

int main(void)
{
    return 0;
}

This program should execute normally, but it doesn't do anything useful. So, let's export write syscall to C:

.text
.global _start
.global write

write:
    push    {r7}
    mov     r7, #4
    svc     #0
    pop     {r7}
    bx      lr

_start:
    bl      main
    
_exit:
    mov     r0, #0
    mov     r7, #1
    svc     #0

We have to declare function for C to recognize its existance, but it should be straightforward. Also notice that I do not do movs to registers r0, r1, and r2, because C compiler does this for me!

But what compiler also can do is to store some of the local variables in the r7 register, so we have to save them to stack before using them! Instruction push {r7} does exactly that, and when our work is done, we restore its content via pop instruction.

int write(int fd, char const *buffer, int count);

int main(void)
{
    write(1, "Hello, World!\n", 14);
    return 0;
}

Now let's try it:

$ as start.s -o start.o
$ gcc main.c -c -nostdlib -o main.o
$ ld start.o main.o -o main
$ chmod +x main
$ ./main
Hello, World!
$ echo $?
0

It works!!! How cool is that?

Passing argc, argv

If you know you C language, you know that main is not actually int main(void);, it's int main(int argc, char **argv, char **env);. So let's pass those arguments in.

Unfortunately, I didn't find anything about where to find those, so I had to roll up my sleeves, and open gdb.

Memory inspection

First, let's recompile main.c with debug symbols and run gdb:

$ gcc main.c -c -g -nostdlib -o main.o
$ ld start.o main.o -o main
$ chmod +x main
$ gdb main
(gdb)> start
5          write(1, "Hello, World!\n", 14);
(gdb)>

I do not really know where to find CLI arguments, but let's start with examining registers:

(gdb)> info registers
r0          0x0             0
r1          0x0             0
r2          0x0             0
r3          0x0             0
r4          0x0             0
r5          0x0             0
r6          0x0             0
r7          0x0             0
r8          0x0             0
r9          0x0             0
r10         0x0             0
r11         0x7efff68c      2130704012
r12         0x0             0
sp          0x7efff688      0x7efff688
lr          0x10084         65688
pc          0x100a8         0x100a8 <main+8>
cpsr        0x10            16
fpscr       0x0             0
(gdb)>

I can't imagine where you could put arguments if not on the stack, so let's see what is in there. For that, we are going to use sp register, that stores pointer to the top of the callstack.

Gdb command x/16xw <address> eXamines memory, and prints 16 words (32 bit values) in hexadecimal format, starting at address.

(gdb)> x/16xw 0x7efff688
0x7efff688:     0x00000000      0x00010084      0x00000001      0x7efff7cc  
0x7efff698:     0x00000000      0x7efff7e6      0x7efff7f6      0x7efff805
0x7efff6a8:     0x7efff814      0x7efff82d      0x7efff837      0x7efff84c
0x7efff6b8:     0x7efff85b      0x7efff86a      0x7efff873      0x7efff880
(gdb)>

Here it is! Can you see it?

First two values are probably something that is pushed when main is called. But next, we see 1, then address, then 0, then a bunch of addresses. I bet, it should be argc, then argv, then 0, then env.

Let's examine assumed argv (x/32c <address> prints 32 characters at the address):

(gdb)> x/32c 0x7efff7cc
0x7efff7cc:     47 '/'  100 'h' 111 'o' 109 'm' 101 'e' 47 '/'  117 'u' 47 '/'
0x7efff7d4:     112 'p' 114 'r' 111 'o' 106 'j' 101 'e' 99 'c'  116 't' 115 's'
0x7efff7dc:     47 '/'  102 'f' 111 'o' 112 'p' 47 '/'  109 'm' 97 'a'  105 'i'
0x7efff7e4:     110 'n' 0 '\000'        83 'S'  72 'H'  69 'E'  76 'L'  76 'L'  61 '='
(gdb)>

Yes! That's it, now we just move argc to r0, and pointer to the argv to r1. We even see that SHELL= starts at address 0x7efff7e6 which we saw already in the stack, right after 0.

We can easily fill r0 and r1 registers like that:

_start:
    ldr     r0, [sp]
    add     r1, sp, #4
    bl      main

In order to get the env argument, we need to skip all argv[i] pointers, and 0. Happily we have number of those in the r0 register, which stores argc argument. I use mla (multiply-add) instruction for that like this:

_start:
    ldr     r0, [sp]
    add     r1, sp, #4
    mov     r3, #4
    mla     r2, r0, r3, r1
    add     r2, #4
    bl      main

Now we can print both argv, and env in main function:

int write(int fd, char const *buffer, int count);

int string_size_no0(char const *s)
{
    int count = 0;
    while (s[count]) { count += 1; }
    return count;
}

int main(int argc, char **argv, char **env)
{
    for (int i = 0; i < argc; i++)
    {
        int n = string_size_no0(argv[i]);
        write(1, argv[i], n);
        write(1, "\n", 1);
    }
    
    for (int env_index = 0;; env_index++)
    {
        char *e = env[env_index];
        if (e == 0) break;
        int n = string_size_no0(e);
        write(1, e, n);
        write(1, "\n", 1);
    }
    
    return 0;
}

Now this should work.

Conclusion

I learned a lot during this weekend, and I hope you too. All this code is in public domain, you could use it, but I do not guarantee you anything. It's just my experience, that I wanted to share.

PROJECT=hello_world
ASM=as
CC=gcc
LINKER=ld
function handle_errors {
rc=$?
if [[ "$rc" -ne 0 ]]
then
echo "ERROR (return code $rc)"
exit $rc
fi
}
$ASM start.s -o start.o
handle_errors
$CC main.c -c -nostdlib -Wall -Werror -o main.o
handle_errors
$LINKER start.o main.o -o $PROJECT
handle_errors
rm start.o
rm main.o
chmod +x $PROJECT
int write(int fd, char const *buffer, int count);
int main(int argc, char **argv, char **env)
{
write(1, "Hello, World!\n", 14);
return 0;
}
.text
.global _start
.global write
write:
push {r7}
mov r7, #4
svc #0
pop {r7}
bx lr
_start:
ldr r0, [sp, #0]
add r1, sp, #4
mov r3, #4
mla r2, r0, r3, r1
add r2, #4
bl main
_exit:
movs r0, #0
movs r7, #1
svc #0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment