Skip to content

Instantly share code, notes, and snippets.

@takkaria
Last active November 13, 2020 00:34
Show Gist options
  • Save takkaria/5ea7ab8abe20aac6d0a19d8595d76b42 to your computer and use it in GitHub Desktop.
Save takkaria/5ea7ab8abe20aac6d0a19d8595d76b42 to your computer and use it in GitHub Desktop.
finn c pointers

I understand the intent of what you were doing, which was to return an array from ziffern_extrahieren() and copy the results of that array into the array in is_armstrong_number(). I'll point out the mistakes in what you wrote and explain how you do this in C.

Compiler errors

int *ziffern_extrahieren(int candidate, int laenge) {
	int *ziffern = malloc(...);
	...
	return *ziffern;
}

Here when you write return *ziffern you are returning what ziffern points to. The way C works, *ziffern is equivalent to writing ziffern[0]. So you're not returning an int *, you're returning an int. You'd want to write just return ziffern.

int *ergebnis = malloc(..);
*ergebnis = ziffern_extrahieren(...);

So here, you have written is "set what ergebnis points to (ergebnis[0]) to the return value of ziffern_extrahieren()". But you can't do this, because the type of *ergebnis is int, and the return type of ziffern_extrahieren() is int *.

So now you see why people stopped using C!

Different approaches

1. memcpy

In C the way you write "take this bit of memory and copy it over to this other bit of memory" (which is the only way you can copy arrays in C apart from iterating over each element) is using memcpy():

void *memcpy(void *destination, void *source, size_t length);

so

int *result = ziffern_extrahieren();
memcpy(ergebnis, result, laenge * sizeof int);

Will fill ergebnis with the result of ziffern_extrahieren(). In full:

// Note this version doesn't return a value
int *ziffern_extrahieren(int candidate, int laenge) {
	int *ziffern = malloc(laenge * sizeof int);
	int power = 1;
	for (i = laenge; i > 0; i--) {
		ziffern[i] = (candidate % (10 * power)) / power;
		power *= 10;
	}
	return ziffern;
}

bool is_armstrong_number(int candidate) {
	int laenge = laenge_berechnen(candidate);
	int *ergebnis = malloc(laenge * sizeof int);
	int *result = ziffern_extrahieren(candidate, ergebnis, length)
	memcpy(ergebnis, result, laenge * sizeof int);
	...
	return true;
}

There are two problems here still though. The first is that the above code has two memory leaks. Once memory is allocated, you need to free it. This can be fixed by adding in:

	free(result);
	free(ergebnis);

before the final return.

The second is more conceptual, and it's that you're allocating two different bits of memory to contain the same data and copying from one to the other. In C people generally avoid that using one of the other three approaches...

2. Allocate memory in is_armstrong_number() and pass it to ziffern_extrahieren() to be filled

This is a neater way of doing things. It's nice because the malloc() and the free() happen in the same function, so it's easier not to have a memory leak.

// Note this version doesn't return a value
void ziffern_extrahieren(int candidate, int *ziffern, int laenge) {
	int power = 1;
	for (i = laenge; i > 0; i--) {
		ziffern[i] = (candidate % (10 * power)) / power;
		power *= 10;
	}
}

bool is_armstrong_number(int candidate) {
	int laenge = laenge_berechnen(candidate);
	int *ergebnis = malloc(laenge * sizeof int);
	ziffern_extrahieren(candidate, ergebnis, length)
	...

	// Make sure you call free() before returning or you have a memory leak.
	free(ergebnis);
	return true;
}

3. Use variable length arrays

I don't know what C compiler you are using, but C99 compilers let you allocate memory on the stack without using malloc. If you're using Microsoft's compiler then you can't do this because they still don't support a 20 year old standard, but anywhere else the following is valid C!

// ziffern_extrahieren is the same as above

bool is_armstrong_number(int candidate) {
	int laenge = laenge_berechnen(candidate);
	int ergebnis[laenge];
	ziffern_extrahieren(candidate, ergebnis, length)
	...
	return true;
}

This is the difference between heap allocation and stack allocation:

  • Heap allocation (asking the OS for some memory please) means you're taking over the limited memory management that the C compiler does and asking for bits of memory yourself - and freeing them later (or fogetting to).

  • Stack allocation is what you get when you write variable declarations in your code - the C compiler allocates memory for these and frees it again when it's out of scope.

4. Allocate the memory in ziffern_extrahieren() and return it to is_armstrong_number()

This is like number 2.

int *ziffern_extrahieren(int candidate, int laenge) {
	int *ziffern = malloc(laenge * sizeof int);
	int power = 1;
	for (i = laenge; i > 0; i--) {
		ziffern[i] = (candidate % (10 * power)) / power;
		power *= 10;
	}
	return ziffern;
}

bool is_armstrong_number(int candidate) {
	int laenge = laenge_berechnen(candidate);
	int *ergebnis = ziffern_extrahieren(candidate, laenge);
	...
	free(ergebnis);
	return true;
}

This is sort of nicer than 1 because the function that needs the memory creates it, which means that you can't accidentally allocate the wrong amount of memory to pass into the function. But it does mean the function where the memory is allocated is not the same function as when it needs to be freed, which makes it easier to mess up and cause memory leaks.

@takkaria
Copy link
Author

Question from Finn:

just to make sure i got it right: variables are still accessible after leaving a function even when they aren't explicitly a return value or specified as a global variable? i mean, i can still access the variables declared inside a void function without a specific return value outside of it unless the memory is explicitly freed?

That is a great question! But to answer it we'll need to go on a bit of an adventure... in the nature of the C variables, the memory model underneath C variables and the secret of the equivalence of pointers and arrays.

The nature of C variables

A variable in the C is a name for a memory location. So if I run:

#include <stdio.h>
#include <inttypes.h>

int main(void) {
        int value = 10203;

        printf("val = %d\n", value);
	printf("addr = %#" PRIxPTR "\n", (intptr_t) &value);

        return 0;
}

I get:

val = 10203
addr = 0x7efe814c

This tells us that value is a name for memory address 0x7efe814c. (This will be different on different computers and each time you run the program. It's in hexadecimal because that's the conventional way to represent memory addresses.)

(What's going on with the rest of it? (intptr_t) is a cast which means "interpret the thing on the right as an intptr_t". intptr_t is a type of integer that has enough space to hold a reference to any location in memory. The PRIxPTR bit is the way you print that type.)

The type of a variable tells us how much space it takes up in memory. This is what the sizeof operator tells us. On my computer, sizeof(int) == 4. That's 4 bytes, or 32 bits. (I'm using a 32 bit computer.) So val starts at address 0x7efe814c and takes up 4 bytes.

Looking at memory locations

Underneath the idea of a varible in C is the idea of memory locations. You can't really understand variables in C without understanding the memory model behind it. Here's a program which illustrates this.

#include <stdio.h>
#include <inttypes.h>
#include <limits.h>

int main(void) {
	// Let's use the largest possible signed integer this time.
	// This is found in limits.h.
	// I'm on a 32-bit OS so this is a 32-bit value.
	// Most computers are now 64 bits, probably including yours, so if
	// you run this program you'll get different results!
        int value = INT_MAX;

	// A bit of a detour: (maybe you know this already, in which case skip it :))
	// `int` is by default signed, that is, it can hold positive or negative numbers.
	// A 32 bit signed int uses its 32nd bit to store the sign (0 = +ve, 1 = -ve)
	// which leaves 31 bits available to store the number.
	//
	// This means the largest number that can be stored in a 32 bit signed integer
	// looks like:
	// - 01111111111111111111111111111111 in binary
	// - 0x7fffffff                       in hexadecimal
	// - 2147483647                       in decimal 

	// Print `value` in hexadecimal.  (hint: it's 0x7fffffff)
        printf("INT_MAX = %#x\n", value);

	// &value takes the memory address of `value`.
	// So `reference` becomes a pointer to `value`.
        uint8_t *ptr = (uint8_t *) &value;

	// What's a uint8_t?  It's C's way of writing 'byte':
	//           u    nsigned
	//           int  eger with
	//           8    bits, and it's a
        //           _t   type
	//
	// So what we're saying is, we want to access `value` (which is 4 bytes long)
	// as if it's an array of bytes, using `ptr`.

	// The types of `reference` and `&value` are different (`uint8_t *` vs `int *`).
	//
	// uint8_t *ptr = (uint8_t *) &value;
	//                ^^^^^^^^^^^
        //                     |
	// This bit is a 'cast'.  It says "interpret the thing on the right side of me
	// as if it was this other type".  We need it becuse the type of `&value` without 
	// it is int *, and the compiler will complain about it.

	// So we go through each byte of `value` and print it.
        for (int i = 0; i < sizeof(int); i++) {
                printf("addr = %#"PRIxPTR", val = %#x\n",
		               &ptr[i],           ptr[i]);
		//             ^ the memory       ^ the value
		//               location of        of ptr[i]
		//               ptr[i]
        }

        return 0;
}

For me this program prints:

INT_MAX = 0x7fffffff
addr = 0x7eaca144, val = 0xff
addr = 0x7eaca145, val = 0xff
addr = 0x7eaca146, val = 0xff
addr = 0x7eaca147, val = 0x7f

What happened?

We set up a variable containing 0x7fffffff. We then set up a byte-based access to the memory this variable was stored at using ptr, and printed each of the 4 bytes that made up the contents of the variable.

But we got 0xff 0xff 0xff 0x7f. Why? I won't go on long diversion here, but 0x7fffffff is stored the "wrong way around" in memory. From the way we write numbers for humans - with the largest parts on the left and the smallest parts on the right - it makes no sense to write 0x7fffffff as four bytes of 0xff 0xff 0xff 0x7f. We'd expect the four bytes to be 0x7f 0xff 0xff 0xff. But computers are weird. This is called 'little-endianness' and it's just how most computers store numbers.

OK, but the real point of this example was to show how a variable in C is a name for a memory location. Both value and *ptr above are names for the same memory location. You can access it as an int, or as an array of bytes (uint8_ts).

This is really different from high level languages. I don't know Java, but in Python variables are references to instances of objects. You can't just take a number and access it as an array, they're different kinds of things. And the underlying memory model is hidden from you. In C more or less everything is just some way of accessing a memory location.

OK, so you're asking, how does this relate to how long variables are available for? Umm... Just wait while we get distracted by something else :)

Arrays and pointers

I'm going to go on a diversion to show how an array and a pointer are not different kinds of things in C. Which I hope will be useful.

In the following example I'm pulling in the assert() macro. This will abort the program with an error message if it's passed something that doesn't evaluate to true.

#include <limits.h>
#include <inttypes.h>
#include <assert.h>

int main(void) {
        uint8_t values[4] = { 0xff, 0xff, 0xff, 0x7f };
	uint8_t *ptr = values;
	//             ^ Notice here we're not writing &values!
	//
        // This is because `values` is already the name of a location in memory -
	// the beginning of the array.  So:
        assert(values[0] == *values);

	// In fact, array access in C is just a fancy way of dereferencing pointers:
        assert(values[1] == *(values+1));
        assert(values[2] == *(values+2));
        assert(values[3] == *(values+3));

	// Why does this work?
	// In higher-level languages, an array or list is a thing that contains
	// multiple values.  It probably knows its own length.  How it's stored is hidden
	// from you and you can't access the lower level details of it.
	//
	// In C, though, an array is just values stored one after the other in memory.
	// This is a core feature of the language.  There isn't a higher level concept
	// for it.
	//
	// Memory location	value		name
	// 0			0xff		values
	// 1			0xff
	// 2			0xff
	// 3			0x7f

        // These are also true:
        assert(ptr[0] == *ptr);
        assert(ptr[1] == *(ptr+1));
	// ... and so on ...
	//
	// So an array and a pointer in C have some things in common.  They
	// are both names for a location in memory.  And array syntax ([]) is equivalent
	// to pointer referencing syntax (*).
	//
	// In general, x[n] is the same as writing *(x+n).

	// For extra points, let's set view the array above as a single 32-bit integer.
	// This is the reverse of what we did in the last section.
	// We make a 32-bit integer pointer point to values, and then we dereference it
	// and assert that it contains the largest possible integer value.  (Which it does!)
        int32_t *int_value = (int32_t *) values;
        assert(*int_value == INT32_MAX);

        return 0;
}

Hopefully this clarifies a little bit the difference between higher level languages' ideas of arrays and C's.

Different ways memory can be allocated

So, for something a bit more directly related to your question, let's return to the issue of memory allocation. I talked before about stack and heap allocation but let's go into more detail.

#include <inttypes.h>
#include <assert.h>

void fn(uint8_t *a) { return; }

uint8_t *example_code(void) {
        uint8_t values[4] = { 0xff, 0xff, 0xff, 0x7f };

	// `values` is allocated on the stack.  That means that the memory
	// that stores that data is allocated by the compiler when the function
	// is entered.

	// What does allocated mean, though?
	// Really it means 'reserved'.  The compiler knows we need 4 bytes of
	// storage for `values` so it finds us 4 bytes and makes sure it doesn't
	// assign that area of memory to anything else.

	// When the function returns, it removes that reservation, so that memory
	// can be re-used by something else.

	// So when we do:
	fn(values);

	// fn() can read all the data in `values` just fine.

	// But if we return it...
	return values;

	// The compiler warns me:
	// test.c: In function ‘example_code’:
	// test.c:27:9: warning: function returns address of local variable [-Wreturn-local-addr]
	//   return values;
	//          ^~~~~~
}

int main(void) {
        // There is no guarantee this will work.
        uint8_t *vals = example_code();
        assert(*vals == 0xff);

	// On my computer this results in the text "Segmentation fault", which is
	// C's way of saying "you just tried to access memory in a bad way".

        return 0;
}

And here's another version that doesn't segfault:

#include <inttypes.h>
#include <assert.h>
#include <stdlib.h>

uint8_t *example_code(void) {
	// This time we'll allocate `values` on the heap.  This means we explictly
	// ask the compiler/OS for some memory to keep our stuff in.

	uint8_t *values = malloc(4);
        values[0] = 0xff;
	values[1] = 0xff;
	values[2] = 0xff;
	values[3] = 0x7f;

	return values;
}

int main(void) {
	// This code will work fine.
        uint8_t *vals = example_code();
        assert(*vals == 0xff);

	// I don't get a segmentation fault here.  Why?
	// Because when you allocate on the heap you are taking control of allocation
	// and deallocation.
	// Calling malloc(4) says "please assign me 4 bytes of memory and I'll manage
	// it myself".  Which means, again, we get to reserve some space for our
	// code, but instead of the compiler unreserving it when we leave the function,
	// it stays reserved for us until we free it up:
	free(vals);

        return 0;
}

An attempt at answering your question

OK, so going back to your question, hopefully there's enough background now (lol) to answer it in a reasonably short way:

just to make sure i got it right: variables are still accessible after leaving a function even when they aren't explicitly a return value or specified as a global variable?

The contents of a variable allocated in a function are only accessible after that function has returned if the memory for the variable was allocated on the heap (using malloc()).

Actually, I'll complicate that a bit (yay!). If you're returning a value from a function:

int zweiundzwanzig(void) { int v = 42; return v; }

Then yes, that's accessible from outside the function. But it's different from returning a memory location:

int *array(void) { int v[4] = { 0, 0, 0, 0 }; return v; }
// array() returns a pointer to a local variable that's not accessible
// outside the function

This wouldn't work. So let's rephase the above:

The contents of a variable allocated in a function are only accessible after that function has returned if:

  • the memory for the variable was allocated on the heap (using malloc()), or
  • its contents are returned directly as a value.

The difference between returning a value and returning a pointer is really important in C and doesn't exist in higher level languages where you don't do manual memory allocation.

So why is that? Well, what is a pointer exactly? This gets quite Inception-y...

What is a pointer?

#include <stdio.h>
#include <inttypes.h>
#include <assert.h>

int main() {
        // `v` is allocated enough space for an int value by the compiler.
        int v = 255;

        // `v` is a name for a memory location.
        // That memory location contains the value '255'.
        // The address of that location is accessible by writing `&v`.

        int *p = &v;
        // OK, so `p` here is ALSO a name for a memory location.
        // That memory location contains the value '&v'.
        // The address of that location is accessible by writing `&p`...

        int **pp = &p;
        // `pp` is the name of a memory location.
        // That memory location contains the value '&p'.
        // The address of that location is accessible by writing `&pp`.

        int ***ppp = &pp;
        // `ppp` is the name of a memory location.
        // That memory location contains the value '&pp'.
        // The address of that location is accessible by writing `&ppp`.

        // We could be here all night, you get the idea.
        // But let's have some fun...
        assert(v == 255);
        assert(*p == 255);
        assert(**pp == 255);
        assert(***ppp == 255);

        // Let's print out all these variables so we can see what the compiler
        // is doing.
        printf("name =    v, address = %#" PRIxPTR ", value = %#x\n", (intptr_t) &v, v);
        printf("name =    p, address = %#" PRIxPTR ", value = %#x\n", (intptr_t) &p, p);
        printf("name =   pp, address = %#" PRIxPTR ", value = %#x\n", (intptr_t) &pp, pp);
        printf("name =  ppp, address = %#" PRIxPTR ", value = %#x\n", (intptr_t) &ppp, ppp);

        return 0;
}

Output:

name =    v, address = 0x7eb7714c, value = 0xff
name =    p, address = 0x7eb77148, value = 0x7eb7714c
name =   pp, address = 0x7eb77144, value = 0x7eb77148
name =  ppp, address = 0x7eb77140, value = 0x7eb77144

What is this saying then? Pointers actually are values. The value of a pointer is the address of the memory location it points to.

You can also see the addresses of each variable is 4 bytes higher than the next. So the actual layout of this program's memory looks like:

Location Value Name
0x7eb77140 0x7eb77144 ppp
0x7eb77144 0x7eb77148 pp
0x7eb77148 0x7eb7714c p
0x7eb7714c 0xff v

This makes it easy to define what the pointer dereferencing operator * does. It means "Take the value of the thing on the right of me, treat it as a memory location, and return what is in that memory location".

So *p means "take the value of p (0x7eb7714c), treat it as a memory location, and return what's in that memory location (0xff)".

If you're wondering how to read that line ***ppp above, maybe it's easier with parenthesis? *(*(*ppp)):

  • Take the value of ppp (0x7eb77144), treat it as a memory location, and return what's there (0x7eb77148)
  • Take the value of *ppp (0x7eb77148), treat it as a memory location, and return what's there (0x7eb7714c)
  • Take the value of *(*ppp) (0x7eb7714c), treat it as a memory location, and return what's there (0xff).

I've used **p before but I don't think I've ever used ***p in a real world program. I'm showing it for your understanding the concept, not because you'll likely ever use it!

Memory allocation in C (again)

This has turned into a run through the trickiest bits of C, so I hope I've managed to make everything clear enough so far. If not, it's not your fault, it's genuinely difficult and it took me three and a half hours to get to this point in the explanation.

But I want to come back to what I said just before when trying to answer your question:

The contents of a variable allocated in a function are only accessible after that function has returned if:

  • the memory for the variable was allocated on the heap (using malloc()), or
  • its contents are returned directly as a value.

I want to refine this again now we've defined a pointer as a kind of value. We can be more precise.

All C functions that return something return values. A pointer is a value - its value is a memory location.

So the problem we encounter when we do something like this:

int *fehler(void) {
	int v[4] = { 0, 0, 0, 0 };

	// This function returns a pointer to the beginning of the array
	return v;
}

is that v is a pointer that points to a piece of memory which is no longer allocated when the function returns.

When we take control of allocation, though, using malloc(), this is no longer a problem. We're taking control of the life-cycle of our data by allocating and freeing the memory used for it ourselves.

Back to the original problem

Here is the code from solution #4 above again:

int *ziffern_extrahieren(int candidate, int laenge) {
	int *ziffern = malloc(laenge * sizeof int);
	int power = 1;
	for (i = laenge; i > 0; i--) {
		ziffern[i] = (candidate % (10 * power)) / power;
		power *= 10;
	}
	return ziffern;
}

bool is_armstrong_number(int candidate) {
	int laenge = laenge_berechnen(candidate);
	int *ergebnis = ziffern_extrahieren(candidate, laenge);
	...
	free(ergebnis);
	return true;
}

ziffern_extrahieren returns a pointer - a memory location. Because this memory was allocated on the heap using malloc(), it is still accessible after the function returns. And it'll remain accessible until free() is called.

Your question, again (for the last time, I promise!)

OK, so maybe I can answer your question again:

are variables still accessible after leaving a function even when they aren't explicitly a return value or specified as a global variable? can i still access the variables declared inside a void function without a specific return value outside of it unless the memory is explicitly freed?

When the memory for a variable is allocated on the stack,

  • its memory location is accessible within that function and the functions it calls;
  • its memory is deallocated by the compiler when the function returns;
  • accesses to that memory location after the function has returned have unpredictable results (you might crash the program or the memory location might not contain what you expect because it has been reassigned)

When you allocate memory on the stack in a function,

  • you can return a pointer to the memory;
  • that memory is available until it is manually freed.

There is one other kind of allocation - global allocation. Globals are not on the stack or the heap and they are never freed, they're accessible everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment