# Proteas/Quirks of C.md

Quirks of C

Here's a list of mildly interesting things about the C language that I learned mostly by consuming Clang's ASTs. Although surprises are getting sparser, I might continue to update this document over time.

There are many more mildly interesting features of C++, but the language is literally known for being weird, whereas C is usually considered smaller and simpler, so this is (almost) only about C.

## 1. Combined type and variable/field declaration, inside a struct scope [https://godbolt.org/g/Rh94Go]

```struct foo {
struct bar {
int x;
} baz;
};

void frob() {
struct bar b; // <-- defined in body of `struct foo`
}```

## 2. Compound literals are lvalues [https://godbolt.org/g/Zup5ZB]

```struct foo {
int bar;
};

void baz() {
// compound literal:
// https://en.cppreference.com/w/c/language/compound_literal
(struct foo){};

// these are actually lvalues
((struct foo){}).bar = 4;
&(struct foo){};
}```

## 3. Switch cases anywhere [https://godbolt.org/g/fSeL18]

```void foo(int p, char* complicated) {
switch (p) {
case 0:
if (complicated == 'a') {
if (complicated == 'b') {
case 1:
complicated = 'c';
}
}
break;
}
}```

(also see: Duff's Device)

## 4. Flexible array members [https://godbolt.org/g/HCjfzX]

```struct flex {
int count;
int elems[]; // <-- flexible array member
};

// this lays out the object exactly as expected
struct flex f = {
.count = 3,
.elems = {32, 31, 30}
};

_Static_assert(sizeof(struct flex) == sizeof(int), "");
// sizeof(f) does not include the size of statically-declared elements
_Static_assert(sizeof(f) == sizeof(struct flex), "");

// this only builds because .elems is not initialized:
struct flex g;```

## 5. {0} as a universal initializer [https://godbolt.org/g/MPKkXv]

```typedef int empty_array_t;
typedef struct {} empty_struct_t;
typedef int array_t;
typedef struct { int f; } struct_t;
typedef float vector_t __attribute__((ext_vector_type(4)));

// {} can initialize structs and arrays and vectors, but not scalars:
empty_array_t ea = {};
empty_struct_t es = {};
array_t a = {};
struct_t s = {};
vector_t v = {};
void* p = {}; // <-- error
int i = {}; // <-- error

// {0} can initialize any data type, including empty arrays/structs.
empty_array_t eaa = {0};
empty_struct_t ess = {0};
array_t aa = {0};
struct_t bb = {0};
vector_t cc = {0};
void* dd = {0}; // <-- happy!
int ee = {0}; // <-- happy!```

## 6. Function typedefs [https://godbolt.org/g/5ctrLv]

```typedef void (*function_pointer_t)(int); // <-- this creates a function pointer type
typedef void function_t(int); // <-- this creates a function type
// function_pointer_t == function_t*

function_t my_func; // <-- this declares "void my_func(int)"

void bar() {
my_func(42);
}```

## 7. Array pointers [https://godbolt.org/g/N85dvv]

```typedef int array_t; // array typedef
typedef array_t* array_ptr_t; // array pointer typedef
// same as:
// typedef int (*array_ptr_t);

void foo(array_ptr_t array_ptr) {
int x = (*array_ptr);
}

void bar() {
int arr_10;
foo(&arr_10); // <-- yep

int arr_11;
foo(&arr_11); // <-- nope
}```

## 8. Modifiers to array sizes in parameter definitions [https://godbolt.org/z/FnwYUs]

```void foo(int arr[static const restrict volatile 10]) {
// static: the array contains at least 10 elements
// const, volatile and restrict all apply to the array type.
}```

(corrected by Reddit user /u/romv1)

## 9. Flat initializer lists [https://godbolt.org/g/RmwnoG]

```struct foo {
int x, y;
};

struct lots_of_inits {
struct foo z;
int w;
};

// this is probably more typical
struct lots_of_inits init = {
{{1, 2}, {3, 4}}, {5, 6, 7}
};

// but braces for inner elements are optional
struct lots_of_inits flat_init = {
1, 2, 3, 4, 5, 6, 7
};```

## 10. What’s an lvalue, anyway [https://godbolt.org/g/5echfM]

```struct bitfield {
unsigned x: 3;
};

void foo() {
int a;
int i;
const int j;
struct bitfield bf;

// these are all lvalues
a; // DeclRefExpr <col:5> 'int ' lvalue Var 0x556800650150 'a' 'int '
i; // DeclRefExpr <col:5> 'int' lvalue Var 0x56289851bf20 'i' 'int'
j; // DeclRefExpr <col:5> 'const int' lvalue Var 0x555fc6694ff0 'j' 'const int'
bf.x; // MemberExpr <col:5, col:8> 'unsigned int' lvalue bitfield .x 0x55dab002de28

// this is not an lvalue
foo; // DeclRefExpr <col:6> 'void ()' Function 0x563cb79da098 'foo' 'void ()'

// ... but you can't assign to all of them
// a = (int ){1, 2};
i = 4;
// j = 4;
bf.x = 4;

// ... and you can't take all of their addresses
&a;
&i;
&j;
// &bf.x;
&foo; // but you can take the address of a function, which is not an lvalue

// so, an lvalue is a value that:
// - can have its address taken...
//  - unless it is a bitfield (still an lvalue)
//  - unless it is a function (not an lvalue)
// - can be assigned to...
//  - unless it is an array (still an lvalue)
//  - unless it is a constant (still an lvalue)
}```

## 11. Void globals [https://godbolt.org/z/C52Wn2]

```// You can declare extern globals to incomplete types,
// including `void`.
extern void foo;```

## 12. Alignment implications of bitfields [https://godbolt.org/z/KmB4CB]

```struct foo {
char a;
long b: 16;
char c;
};

// `struct foo` has the alignment of its most-aligned member:
// `long b` has an alignment of 8...
int alignof_foo = _Alignof(struct foo);

// ...but `long b: 16` is a bitfield, and is aligned on a char
// boundary.
int offsetof_c = __builtin_offsetof(struct foo, c);```

## 13. `static` variables are scope-local [https://godbolt.org/z/hdcLYW]

```int foo() {
int* a;
int* b;
{
static int foo;
a = &foo;
}
{
static int foo;
b = &foo;
}
// this always returns false: two static variables with the same name
// but declared in different scope refer to different storage.
return a == b;
}```

## 14. Typedef goes anywhere [https://godbolt.org/z/vZmgha]

```short typedef signed s16;
unsigned int typedef u32;
struct foo { int bar } const typedef baz;

s16 a;
u32 b;
baz c;```

## 15. Indexing into an integer [https://godbolt.org/z/IBA5Gr]

```int foo(int* ptr, int index) {
// When indexing, the pointer and integer parts
// of the subscript expression are interchangeable.
return ptr[index] + index[ptr];
// It works this way, according to the standard (§6.5.2.1:2),
// because A[B] is the same as *(A + B), and addition
// is commutative.
}```

# Special mentions

## 1. The power of UB [https://godbolt.org/g/H6mBFT]

```extern void this_is_not_directly_called_by_main();

static void (*side_effects)() = 0;

void bar() {
side_effects = this_is_not_directly_called_by_main;
}

int main() {
side_effects();
}```

compiles to:

``````bar:                                    # @bar
ret
main:                                   # @main
push    rax
xor     eax, eax
call    this_is_not_directly_called_by_main
xor     eax, eax
pop     rcx
ret
``````

Main directly calls `this_is_not_directly_called_by_main` in this implementation. This happens because:

1. LLVM sees that `side_effects` has only two possible values: NULL (the initial value) or `this_is_not_directly_called_by_main` (if `bar` is called)
2. LLVM sees that `side_effects` is called, and it is UB to call a null pointer
3. UB is impossible, so LLVM assumes that `bar` will have executed by the time `main` runs rather than face the consequences
4. Under this assumption, `side_effects` is always `this_is_not_directly_called_by_main`.

## 2. A constant-expression macro that tells you if an expression is an integer constant [https://godbolt.org/g/a41gmx]

```#define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((x) * 0l)) : (int*)1)))

int is_a_constant = ICE_P(4);
int is_not_a_constant = ICE_P(is_a_constant);```

From Martin Uecker, on the Linux kernel ML. `__builtin_constant_p` does the same thing on Clang and GCC.

## 3. Labels inside expression statements in really weird places [https://godbolt.org/g/k9wDRf]

You can make some pretty weird stuff in C, but for a real disaster, you need C++.

```class foo {
int x;

public:
foo();
};

foo::foo() : x(({ a: 4; })) {
goto a;
}```

Needless to say, statement expressions are not standard C++ (or standard C), but if your compiler has them, chances are that you can use them in really interesting ways.

