Skip to content

Instantly share code, notes, and snippets.

@lisovy
Last active August 26, 2022 19:44
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lisovy/b2e8633a53915d7e95c6 to your computer and use it in GitHub Desktop.
Save lisovy/b2e8633a53915d7e95c6 to your computer and use it in GitHub Desktop.
Undefined behavior
==================
http://blog.regehr.org/archives/213
...If any step in a program's execution has undefined behavior,
then the entire execution is without meaning. This is important:
it's not that evaluating (1<<32) has an unpredictable result,
but rather that the entire execution of a program that evaluates
this expression is meaningless. Also, it's not that the execution
is meaningful up to the point where undefined behavior happens:
the bad effects can actually precede the undefined operation...
Integer overflow
''''''''''''''''
Checking for it:
Since Integer overflow is Undefined behavior, the correct check
has to be performed before the arithmetic operation. If it were
performed after the operation, compiler might optimize the check
out since it assumes that the Undefined behavior program path
(=overflow) would not occur.
Some quotes from Stackoverflow:
The undefined behavior of signed arithmetic overflow is used to
enable optimizations; for example, the compiler can assume that
if a > b then a + 1 > b also; this doesn't hold in unsigned
arithmetic where the second check would need to be carried out
because of the possibility that a + 1 might wrap around to 0.
...
Detection of the integer overflow should be done BEFORE the
actual addition/subtraction because of possible undefined behavior.
Another example:
| int stupid (int a)
| {
| return (a+1) > a;
| }
The precondition for avoiding undefined behavior is:
(a != INT_MAX)
Here the case analysis done by an optimizing C or C++ compiler is:
Case 1: a != INT_MAX
Behavior of + is defined -> Computer is obligated to return 1
Case 2: a == INT_MAX
Behavior of + is undefined -> Compiler has no particular
obligations
Again, Case 2 is degenerate and disappears from the compiler’s
reasoning. Case 1 is all that matters. Thus, a good x86-64
compiler will emit:
| stupid:
| movl $1, %eax
| ret
General
'''''''
http://stackoverflow.com/questions/367633/what-are-all-the-common-undefined-behaviours-that-a-c-programmer-should-know-a
* Dereferencing a NULL pointer (In this case it is not about memory location 0;
NULL pointer is not always ((void *)0),
it is 'special pointer' which is forbidden
to being dereferenced)
* Converting pointers to objects of incompatible types (convert the pointer either
to pointer to another datatype or to 'uintptr_t'; conversion to int and long
might (?) be possible, buy try to avoid this since this may truncate the pointer
value)
* Signed integer overflow (i.e. no special requirement to use twos complement;
unsigned overflow is defined -- btw. what about some
architecture that supports only saturated arithmetic?)
* Left-shifting values by a negative amount (right shifts by negative amounts
are implementation defined)
* Shifting values by an amount greater than or equal to the number of bits in
the number (e.g. int64_t i = 1; i << 72 is undefined)
* Attempting to modify a string literal or any other const object during its lifetime
* Not returning a value from a value-returning function
Side effecting operations
'''''''''''''''''''''''''
http://blog.regehr.org/archives/232
...The optimizer performs transformations like this
(i.e. reordering) when they are thought to increase
performance and when they do not change the program’s
observable behavior...
...reordering is legal since stores to global variables
are not defined as side-effecting...
...Actually, just to make things confusing, stores to
globals are side effecting according to the standard,
but no real compiler treats them as such...
Sequence point
''''''''''''''
http://en.wikipedia.org/wiki/Sequence_point
A sequence point is a point in the program's
execution sequence where all previous side-
effects shall have taken place and where all
subsequent side-effects shall not have taken place.
Between the previous and next sequence point an
object shall have its stored value modified at most
once by the evaluation of an expression.
a = a++;
Is undefined since the rules for sequencing says that
you can only update a variable once between sequence
points.
Not only it is undefined, in reality depending on the
order of expression evaluation, the increment may occur
before, after, or interleaved with the assignment.
Unspecified behavior
=====================
* Order of evaluation
http://en.cppreference.com/w/c/language/eval_order
-- The order that function parameters are evaluated
a[i] = i++;
f(foo(), bar());
-- Evaluation of operands of any C operator
f = f1() + f2() + f3();
With exceptions: Evaluation order of a statement consisting
of the '&&', '||' and '?' operator are defined (i.e. those
operators add a sequence point)
http://stackoverflow.com/questions/2456086/if-with-multiple-conditions-order-of-execution)
Implementation-dependent behavior
=================================
Implementation-defined behaviour is an action by a program the
result of which is not defined by the standard, but which the
implementation is required to document. An example is
"Multibyte character literals" --
http://stackoverflow.com/questions/328215/is-there-a-c-compiler-that-fails-to-compile-this
Integer constant
================
http://hardtoc.com/2009/07/16/int-min.html
/* Some of this is triggering undefined behavior */
printf("INT_MAX: %d\n", INT_MAX);
printf("INT_MIN - 1: %d\n", INT_MIN - 1);
printf("-(INT_MIN + 1): %d\n\n", -(INT_MIN + 1));
printf("INT_MIN + 1: %d\n", INT_MIN + 1);
printf("-(INT_MIN - 1): %d\n\n", -(INT_MIN - 1));
printf("INT_MIN: %d\n", INT_MIN);
printf("-INT_MIN: %d\n", -INT_MIN);
printf("INT_MAX + 1: %d\n", INT_MAX + 1);
Result:
INT_MAX: 2147483647
INT_MIN - 1: 2147483647
-(INT_MIN + 1): 2147483647
INT_MIN + 1: -2147483647
-(INT_MIN - 1): -2147483647
INT_MIN: -2147483648
-INT_MIN: -2147483648
INT_MAX + 1: -2147483648
Logic or arithmetic bit shift?
''''''''''''''''''''''''''''''
#define FIELD1_shift 4
#define FIELD1_OPT1_val 0x5
#define FIELD1_OPT2_val 0x7
#define FIELD2_shift 30
#define FIELD2_OPT1_val 0x1
#define FIELD2_OPT2_val 0x2
uint64_t reg1;
/* Initialize the register with some default values */
reg1 = (FIELD1_OPT2_val << FIELD1_shift) |
(FIELD2_OPT2_val << FIELD2_shift);
printf("reg1: 0x%" PRIx64 "\n", reg1);
Result:
reg1: 0xffffffff80000070
Fix:
#define FIELD1_OPT2_val 0x7U
#define FIELD2_OPT2_val 0x2U
Result:
reg1: 0x80000070
String literal
==============
Enum
====
http://codingrelic.geekhold.com/2008/10/ode-to-enum.html
Bit-fields
==========
https://d3s.mff.cuni.cz/pipermail/osy/2012-November/002059.html
LP64 vs. LLP64
==============
http://www.unix.org/version2/whatsnew/lp64_wp.html
Structures
==========
Definition, field order
'''''''''''''''''''''''
From 6.2.5/20:
A structure type describes a sequentially allocated nonempty set of
member objects (and, in certain circumstances, an incomplete array),
each of which has an optionally specified name and possibly distinct
type.
6.7.2.1/15:
15 Within a structure object, the non-bit-field members and the units
in which bit-fields reside have addresses that increase in the order
in which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at
its beginning.
The C standard requires that the elements of a structure are laid
out in the order that they are defined; the first element is at the
lowest address, and the next at a higher address, and so on for each
element. The compiler is not allowed to change the order. There can
be no padding before the first element of the structure. There can
be padding after any element of the structure as the compiler sees
fit to ensure what it considers appropriate alignment.
The general rules about field layout in C are:
The address of the first member is the same as the address of the
struct itself. That is, the offsetof of the member field is 0.
The addresses of the members always increase in declaration order.
That is, the offsetof of the n-th field is lower than that of the
(n+1)-th member.
There may however be padding bytes at the end of the structure as
also in-between members.
Assigning one struct to another
'''''''''''''''''''''''''''''''
https://stackoverflow.com/questions/2302351/assign-one-struct-to-another-in-c
Implicit function prototype
===========================
http://stackoverflow.com/questions/2199076/printf-and-scanf-work-without-stdio-h-why
http://stackoverflow.com/questions/9182763/implicit-function-declarations-in-c
http://stackoverflow.com/questions/11150883/using-printf-function-without-actually-importing-stdio-h-and-it-worked-why-is
...When C doesn't find a declaration, it assumes this implicit
declaration: int f();, which means the function can receive whatever
you give it, and returns an integer...
Pointer aliasing
================
http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule
Trigraphs
=========
int main(int argc, char* argv??(??))
??<
printf("Olol??/n");
return 0;
??>
Const qualifier
===============
int tmp;
int a = 111;
const int b = 222;
int *c;
const int *d; /* pointer to a "const int" */
a = 1;
b = 2; /* error: assignment of read-only variable ‘b’ */
c = &tmp;
d = &tmp; /* OK */
*c = 3;
*d = 4; /* error: assignment of read-only location ‘*d’ */
d = (void*)0; /* OK */
c = &b;
*c = 333; /* OK, b == 333 */
Operator precedence
===================
http://en.cppreference.com/w/c/language/operator_precedence
"2 + 1 << 2" == 16
"2 + (1 << 2)" == 6
Misconceptions
==============
* sizeof() is a function
It is an operator. It can be called like "a = sizeof foo;"
Parentheses are only needed when the argument is a type name.
* char is always one byte in size
http://stackoverflow.com/a/1864999
(Does sizeof return size in Bytes or chars?
Does malloc get size defined in Bytes or chars?)
* pointer set to '\0' is NULL pointer
* Object pointers and function pointers are the same
http://stackoverflow.com/q/3860593
################################################################################
# Libc #
################################################################################
Malloc
======
Doug Lea implementation:
http://g.oswego.edu/dl/html/malloc.html
Casting result of malloc()?
http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc
Malloc tutorial:
http://www.inf.udec.cl/~leo/Malloc_tutorial.pdf
################################################################################
# Linkers & Loaders #
################################################################################
Dynamic library symbol versioning:
http://www.trevorpounds.com/blog/?p=33
################################################################################
# Useful information sources #
################################################################################
[The Descent to C]
http://www.chiark.greenend.org.uk/~sgtatham/cdescent/
[C FAQ]
http://c-faq.com/
[Deep C]
http://www.pvv.org/~oma/DeepC_slides_oct2011.pdf
[Embedded Programming with the GNU Toolchain]
http://www.bravegnu.org/gnu-eprog/
[Feature Test Macro]
http://lwn.net/Articles/590381/
[Musl libc]
http://wiki.musl-libc.org/wiki/Functional_differences_from_glibc
http://wiki.musl-libc.org/wiki/Design_Concepts
[Bionic libc]
http://codingrelic.geekhold.com/2008/11/six-million-dollar-libc.html
https://gitorious.org/0xdroid/bionic/raw/9f65adf2ba3bb15feb8b7a7b3eef788df3fd270e:libc/docs/OVERVIEW.TXT
http://drj11.wordpress.com/2013/09/01/on-compiling-34-year-old-c-code/
http://stackoverflow.com/questions/tagged/c?sort=frequent&pageSize=50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment