Skip to content

Instantly share code, notes, and snippets.

@jy2wong
Last active September 28, 2017 20:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jy2wong/22e621eec9d8fb553e6c to your computer and use it in GitHub Desktop.
Save jy2wong/22e621eec9d8fb553e6c to your computer and use it in GitHub Desktop.
C's memory model notes (WIP)

C's memory model

  • a variable is an identifier (name) paired with a particular block of memory (the size of the block is determined by the type of the variable)
  • a variable's memory block may store values, or it may be uninitialized aka unbound
  • the value is interpreted in a way determined by the type of the variable (e.g. unsigned vs signed ints)
  • when we store a value in a variable['s memory block], we say that we are setting/assigning the variable or binding the variable to a value
  • if we just state its existence without giving it a value, like int foo;, we are declaring it
  • if we declare and assign a variable at the same time, like in int foo = 3;, that's called defining or initializing
  • variables have "scope", or area of reach
  • when a variable falls out of scope, that means its identifier is (possibly temporarily) not associated with that block of memory
  • variables have lifetimes, too
  • when a variable reaches end of life, the block of memory associated with it is up for recycling and there's no guaranteeing what it might hold
  • EOL certainly means falling out of scope, but falling out of scope doesn't mean EOL
  • there are different areas of memory in which a C variable's block of memory can exist.
    • literal pool
    • static/global variable area
    • the stack
    • the heap

Stack frames

  • stacks are a LIFO (last in, first out) structure.
  • think about a pile of pancakes.
  • the last one you threw on is the first one you'll pick up.
  • a stack frame is a "universe" of sorts.
  • the "default" location of a variable is inside a stack frame on the stack
void new() {
  // create a variable called `foo` on the stack in new()'s stack frame
  int foo = 3;
  // cannot access variable `bar` from within this function
}

int old() {
  // create a variable called `bar` on the stack in old()'s stack frame
  int bar = 7;
  new();
  // `new()`'s stack frame is cleaned up after the call
}
  • for every function call you make, a new stack frame is put upon the stack1.
  • a stack frame is destroyed when the function exits/finishes/returns
  • each stack frame starts a new scope with its own variables
  • cannot access variables from previous stack frames (because they are out of universe/"out of scope")
  • a variable declared in a stack frame falls out of scope and reaches end of life when its stack frame is destroyed

Static and global variables

  • local variables are variables declared inside a function

  • global variables are variables declared outside of any functions

  • global variables are always "in scope" (accessible), unless you "shadow" them

  • global variable lifetime is from the time the program starts running to the time it stops (same as the program's lifetime)

  • going to take a quick detour into scope and what it means in C

    #include <stdio.h>
    int i = 13;
    int main() {
      printf("global i: %d\n", i);
      int i = 42;  // now this `i` is shadowing the global `i`
      // we are not allowed to do, say, `int i = 41;` right here because `i` is already defined at this level
      {
        int i = 7;  // now the inner scope's `i` is shadowing the outside `i`
        printf("inner scope's i: %d\n", i);
      }
      printf("main()'s i: %d\n", i);
      return 0;
    }
    • a new "level" starts inside {}s: inside if, else, else if, while, for

    • or even just {}s by themselves like in the above example

    • a common annoyance about switch statements is that you can't declare new variables inside cases. Whatever, just toss in some {}s and it's okay.

      switch(i) {
        case 42: {
          int j = 78;
        }
      }

What does static mean?

  • static variables are declared with the keyword static, like static int foo = 5;

  • static as a keyword means "initialized exactly once" and "lifetime is lifetime of the program"

  • don't even need to assign the variable at declaration; static int foo; is also fine

  • declared but unassigned static variables are initialized with 0

  • keyword means different things depending on where the variable was declared

#include <stdio.h>

static int A = 1;
int C;

void thing() {
  static int B = 2;
  B += 2;
  printf("A: %d, B: %d\n", A, B);
}
int main() {
  printf("%d\n", A);
  A = 7;
  for (int i=0; i<3; i++) {
    thing();
  }

  return 0;
}
  • A is a global static variable

    • A is in scope "everywhere" in this file until/unless it is shadowed
    • A's lifetime is the lifetime of the program
    • okay fine "file" is not pedantically correct but is close enough for now
  • B is a local static variable

    • B is only in scope inside the function thing()
    • B's lifetime is the lifetime of the program
  • C is a normal global variable

    • C is in scope "everywhere" until/if it is shadowed
    • C's lifetime is the lifetime of the program

Heap

struct Thing {
  int value;
};

struct Thing* make_new_thing(int v) {
  struct Thing thing;
  thing.value = v;
  return &thing;
}
  • this won't work since thing is "dead" after make_new_thing() exits
  • can't make thing a local static variable because it's totally reasonable to want to call make_new_thing() more than once
  • let's put our new thing on the heap instead
struct Thing* make_new_thing(int v) {
  struct Thing *thingptr = malloc(sizeof(struct Thing));
  thingptr->value = v;
  // equivalent to (*thingptr).value = v;
  return thingptr;
}
  • lifetime begins with a malloc() or calloc() and ends with a free()
  • thingptr, the variable containing the address of a struct Thing-shaped block of memory on the heap, is on the stack
  • thingptr will fall out of scope and reach end of life at the end of make_new_thing() (its value is passed with the return), but the thing on the heap is still alive, even if you lose track of where it is
  • imagine putting a shoebox in a self-storage unit, then forgetting where the unit is
  • the shoebox is still there, though you've lost the address
  • you must manually manage the lifetime of heap items and free() things when you're done with them

Literal pool

  • "a string literal goes between doublequotes"
  • a string literal is how you write a string value in C source code
char x[] = "hi";
char *y = "hi";
char *z = "hi";
char *fish = "mahi mahi";
  • x is the address where a 3-char wide block of memory on the stack lives

    • this is the syntax for an array
    • the difference between TODO
    • this is syntactic sugar for char x[3] = {'h', 'i', '\0'};
    • the characters of the string literal are copied from the literal pool to the stack on assignment
  • y and z are pointers on the stack

  • the value of y is an address in the literal pool where there's a string that says "hi"

  • so is z, and z's value may be the same as the value of y. That is, the address stored in y is the same as the address stored in z

  • the literal pool is the home of read-only strings

  • unhappiness may occur if you try to modify them in any way

  • wouldn't want y to interfere with z

  • y and z may also (but not necessarily) point at the second last character of "mahi mahi"

  • so like. double weirdness.

  • for the above reasons you should either

    1. specify that the characters of the string are read-only2 (with const char *y = "hi";) so that you will be warned if bad things might happen, or
    2. put the characters of the string on the heap with char *y = strdup("hi");, if you do want to be able to modify the characters of the string

More reading

Footnotes

  1. Not every function call warrants a stack frame; sometimes the code generated by the compiler won't bother setting up a stack frame for optimization reasons

  2. The distinction between const char X[] = "hi"; and const char *Z = "hi" is less clear because of possible compiler optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment