Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
The Platonic Solids of Software Construction and Their Realization in C
No matter the business domain or programming language, programmers always end up needing support for ALL of these things
in some form -
* Resource Semantics (such as RAII in C++, or finalizers in C#)
* Error Semantics (such as exceptions in most languages, or conditions in Common Lisp)
* Algebraic Types (such as structs and unions in C, or records and DUs in F#)
* Data Abstraction (such as that which is often badly entangled within object systems in OO languages)
* Generic Programming / Type Safety (such as function and operator overloading in C++, and generic collections in C#)
* Categories (such as facilities for encoding abstract concepts like Iterables / Enumerables, Functors, Monoids,
and so on)
Because the need for all of these arise independent of the programming domain and language, and due to their
orthogonality when distilled from the language features that enable them, one might think of these as the
'Six Platonic Solids of Software Construction' (perhaps we could more appropriately make it 5 if combine elements
1 and 2...?)
In addition to these 6 elements, there are two non-essential things that we often want just to keep a reasonably
expressive code base -
I) Lambdas (like in Lisps and C++)
II) Async Semantics (like in C# and Haskell)
Unfortunately, for all of these programming elements, C provides very, very little support, and of what it
provide, it provides it weakly.
In light of all this, what follows is a list of C's biggest short-comings, and the ways in which I deal with them
(code pulled from my example C project, https://github.com/bryanedds/ScopeStyle) -
*******************************************************************************
* Problem 1: No Resource Semantics
*******************************************************************************
Solution: Create exclusive lexical scopes for variables that hold resources -
struct error err;
initialize_error(&err, assert_on_error);
{
struct renderer renderer;
initialize_renderer(&renderer, &err);
if (err.active_r) log_and_handle_error(&err), finalize_renderer(&renderer);
else
{
struct physics_engine physics_engine;
initialize_physics_engine(&physics_engine, &err);
if (err.active_r) log_and_handle_error(&err), finalize_physics_engine(&physics_engine);
else
{
struct world world;
initialize_world(&world, &renderer, &physics_engine, &err);
if (err.active_r) log_and_handle_error(&err), finalize_world(&world);
else
{
// actual implementation code that uses the created resources...
}
finalize_world(&world);
}
finalize_physics_engine(&physics_engine);
}
finalize_renderer(&renderer);
}
finalize_error(&err);
Notice that I use commas instead of semicolons so that I can keep the code in a symmetric shape. This lexical
symmetry combined helps to avoid most resource handling errors.
And to avoid double-free disasters, I use these instead of malloc and free -
/// Allocate the given size of memory, asserting successful allocation, exiting upon failure.
void* malloc_safe(size_t size)
{
void* memory = malloc(size);
assert(memory != NULL && "Out of memory.");
if (memory == NULL) exit(EXIT_FAILURE);
return memory;
}
/// Free memory at a pointer, and set the pointer to NULL.
#define free_safe(ptr) do { free(ptr); ptr = NULL; } while (false)
Finally, ownership semantics are otherwise non-existent for value types stored in data structures that expect
pointers to elements. For these data structures, we must pass as freer_opt to its constructor that allows its
to optionally finalize and free elements as they are removed -
void initialize_stack(struct stack* stack, void (*freer)(void*));
If no ownership is to be implied for the containing data structure, then a no-op function is given for (*freer).
*******************************************************************************
* Problem 2: No Error Semantics
*******************************************************************************
C has basically no error semantics. Well, yes, there is the exit handler et al., but that's usually a far cry
from what we need. Many C programmers use functions that return error codes which are then ignored by nearly
everyone (including the programmers who wrote the functions that return them). Some programmers use a global
error struct that each function in turn may stomps over, and are also generally ignored. These are worse than
return codes because, outside of documentation people rarely puruse, they give no indication any error might
occur.
Solution: Pass an error abstraction with specialized semantics to functions that might error -
/// A simple error-propagation abstraction.
struct error
{
/// The current error message.
/// The 'r' suffix denotes the field is read-only outside of error.c.
char msg_r[ERROR_MSG_MAX];
/// The current error code.
int code_r;
/// Is there an active error?
bool active_r;
/// Should an incident of an error assert as well? (useful in _DEBUG mode)
bool assert_as_well_r;
};
// ...in error.c...
static void clear_error(struct error* err)
{
err->msg_r[0] = '\0';
err->code_r = 0;
err->active_r = false;
}
static void populate_error(struct error* err, char* msg, int code)
{
strncpy(err->msg_r, msg, ERROR_MSG_MAX);
err->code_r = code;
err->active_r = true;
}
/// Declare that the current is handled.
void handle_error(struct error* err)
{
if (!err->active_r) assert(false && "No error to handle!");
clear_error(err);
}
/// Propagate a new error in the context of an existing error.
void propagate_error(struct error* err, char* msg, int code)
{
if (!err->active_r) assert(false && "No error to propagate!");
if (err->assert_as_well_r) assert(false && msg);
populate_error(err, msg, code);
}
/// Declare the incident of a new error.
void raise_error(struct error* err, char* msg, int code)
{
if (err->active_r) assert(false && "Error already raised!");
if (err->assert_as_well_r) assert(false && msg);
populate_error(err, msg, code);
}
/// Initialize the error struct.
void initialize_error(struct error* err, bool assert_as_well)
{
clear_error(err);
err->assert_as_well_r = assert_as_well;
}
/// Finalize the error struct.
void finalize_error(struct error* err)
{
// nothing to do
}
This gives us better error-handling than typical approaches, but still not as good as C++. At leasst if an error
is being stomp over by a subsequent unrelated error, the abstraction is smart enough to assert to tell that
something should be handled that isn't. Because of the short-comings of this approach, hoever, this facility
should be used somewhat sparingly, EG - an out-of-memory error is always handled as a terminal condition in
malloc_safe rather than bubbling up to this facility.
*******************************************************************************
* Problem 3: No Type-Safety in Terms of Pervasively Nullable Pointers
*******************************************************************************
This is surprisingly big problem in languages with unsophisticated type system, but also has a surprisingly
effective solution - if one can get over its Hungerian nature :)
Solution: Use the 'opt' suffix to denote the presence of possible null pointer variables so that you can assume
non-null for other variables.
void interact(struct entity* player, struct entity* enemy_opt)
{
// must check for enemy_opt, but not player
if (enemy_opt)
{
// now do something with player and entity...
}
}
Also, use the 'try' prefix or 'opt' suffix to denote functions that may return a valid pointer or NULL -
enemy* try_find_enemy(const char* enemy_name);
- or -
enemy* find_enemy_opt(const char* enemy_name);
Instead of using the type system to denote nullability, use lexical conventions! Code is text, so when all else
fails, use that!
*******************************************************************************
* Problem 4 - Incomplete Data Abstraction
*******************************************************************************
You can use typedef and translation unit tricks to abstract some data, but that's an incomplete approach since
doing so keeps any such abstraction from being instantiated on the stack. So we use a simpler approach that
just fakes abstraction with convention -
Solution: Use lexical conventions to denote readonly and private field semantics -
struct s
{
// fields suffixed with 'r' are readonly except within the scope of the declaring header file and the implementation file
int field_r;
// fields suffixed with 'p' are accessible only within the scope of the declaring header file and the implementation file
int field_p;
// fields without suffixes are writable and public everywhere
int field;
};
Yes, yes, I know, lexical conventions are not ideal. But in this and the preceeding context, they are surprisingly
effective. For all programming languages, there is at least one rule - 'Do what works best with the language
you've got!'
*******************************************************************************
* Problem 5 - No Generic Programming
*******************************************************************************
Solution: Build void*-based data structures, then provide a wrapper struct and macro to generate specialized
interfaces functions as needed -
struct stack
{
// impl...
}
void push_stack(struct stack* stack, void* elem);
void* try_pop_stack(struct stack* stack);
void initialize_stack(struct stack* stack, void (*freer_opt)(void*));
#define SPECIALIZE_STACK(stack_type, elem_type) \
struct stack_type { struct stack stack; }; \
inline void push_##stack_type(struct stack_type* stack_t, elem_type* elem) { push_stack(&stack_t.stack, elem); } \
inline elem_type* try_pop_##stack_type(struct stack_type* stack_t) { return pop_stack(&stack_t.stack); } \
inline void initialize_##stack_type(struct stack_type* stack_t, void (*freer_opt)(elem_type*)) { initialize_stack(&stack_t.stack, freer_opt); }
This one's a doozy with all the macro magickry, but it's not too bad in practice.
*******************************************************************************
* Problem 6 - No Function Overloading
*******************************************************************************
Boy, do we ever need this! It's a testament to C's arthiritic state that we have nothing for it in the general
case! But we do have a partial solution when the possible set of operation targets is closed and known at compile-
time -
Partial Solution: In some cases we can use the _Generic macro to define a closed set of operations over
things like Vector2's, Vector3's, and so on...
The following is an example for cube root in C -
#define cbrt(x) _Generic((x), long double: cbrtl, \
default: cbrt, \
float: cbrtf)(x)
*******************************************************************************
* Problem 7 - No Operator Overloading
*******************************************************************************
We're mostly screwed here.
Non-Solution: Use normal functions with short names and possibly the _Generic macro as above.
Like I said, mostly screwed.
*******************************************************************************
* Problem 8 - No Support for Categories
*******************************************************************************
Here's where our solution get more interesting, what with pointer casts and function fields. While I hold in contempt
most object systems for the way they entangle otherwise orthogonal elements like data abstraction and categories, I
think I'd rather use them in a constrained way than to have to construct my own category constructions from scratch
with C's weak language semantics. But since we're in C, we don't have a choice, and for the overhead, this is the
best I can do so far -
Solution: Use this implementation pattern to encode categories like this -
///////////////////////////////////////////////////////////////////////////////
/// The category of abstractions with sub-typing.
///
/// Additional categories include -
///
/// reflectable & propertied (see ax C++ project),
/// iterable,
/// functor,
/// applicative,
/// alternative,
/// monoid,
/// foldable,
/// traversable,
/// and so on.
///////////////////////////////////////////////////////////////////////////////
struct castable;
const char* get_type_name(struct castable*);
void* try_cast(struct castable*, const char*);
void* cast(struct castable*, const char*);
bool try_copy_to(struct castable*, struct castable*);
bool equal_to(struct castable*, struct castable*);
int hash(struct castable*);
struct castable
{
int castable_tag_p;
const char* (*get_type_name_c)(struct castable*);
void* (*try_cast_c)(struct castable*, const char*);
bool (*try_copy_to_c)(struct castable*, struct castable*);
bool (*equal_to_c)(struct castable*, struct castable*);
int (*hash_c)(struct castable*);
};
void initialize_castable_6(
struct castable* castable,
const char* (*get_type_name)(struct castable*),
void* (*try_cast)(struct castable*, const char*),
bool (*try_copy_to)(struct castable*, struct castable*),
bool (*equal_to)(struct castable*, struct castable*),
int (*hash)(struct castable*));
void initialize_castable(
struct castable* castable,
const char* (*get_type_name)(struct castable*),
void* (*try_cast)(struct castable*, const char*));
#define CASTABLE_TAG 0xAB3A3854 // CRC32 hash of 'castable'
///////////////////////////////////////////////////////////////////////////////
/// Casting utility functions.
///////////////////////////////////////////////////////////////////////////////
/// A weak check for dynamically casting to castable*.
bool is_castable(void* ptr);
/// Cast any pointer to the expected pointer type. Checks for CASTABLE_TAG value at runtime via
/// ((castable*)ptr)->castable_tag_p to at least have some sort of checking. Should be
/// statically-verifiable when using a custom static code analysis rule.
void* up_cast(void* ptr);
/// Hashes a pointer.
int hash_ptr(void* ptr);
You'll notice two important things about this implementation pattern -
1) Even with the function fields, there will still be a need to use the up_cast function a lot, and
2) The up_cast function is _seemingly_ type unsafe.
Let me comment on each point -
1) Yes, having to call up_cast everywhere you need to operate at the level of categories is verbose. Unfortunately,
there is no facility that allows certain pointer to be implicitly up-cast to their first field (even though this is
both semantically valid and useful in C!) I know of no way around this.
2) With some additional tooling, we could definitely augment a static C code analyzer to tell us when a use of
up_cast is valid or not. So the solution here is partially in the language and partially in the tooling. I have
yet to investigate how to extend the existing analysis tooling to achieve this, however.
*******************************************************************************
* Problem 9 - No Nested Functions / Lambdas
*******************************************************************************
Compiler-Dependent Solution: With ANSI C, you are just totally screwed here. However, with GCC, "gcc C has
supported lambda functions since at least version 3.0.4." By using a compiler extension that allows inner
functions and a lambda macro, we can use lambdas in C like this -
qsort (items, items_size, item_size, lambda (int, (const void *a, const void *b),
{
return *(const int*)a - *(const int*)b;
}));
More details on this approach are here - http://walfield.org/blog/2010/08/25/lambdas-in-c.html
*******************************************************************************
* Problem 10 - No Async Semantics
*******************************************************************************
Exploration: I'm not currently sure what to do here. Perhaps with the above lambda and working with the new
concurrency primitives, something can be devised. After all, someone managed to put together a good parser
combinator library in C here -
https://github.com/orangeduck/mpc
- so perhaps this too could well be achieved :)
*******************************************************************************
* Final Analysis
*******************************************************************************
So you might be asking, 'what motivated this strange document to talk about software abstraction at the highest
level in the context of a low-level language like C?
Well, 2 reasons, 1 practical, and 1 pedagogical.
The first reason is straight-forward - I am a functional programmer by choice, but am soon tasked with implementing
a sophisticated system using just C. So I started putting together this document to get my head around the solution
space around the problems of going from a lofty language like F# to C.
The second reason is, perhaps surprisingly, pedagogical. There is a problem that arises when discussing these elements
of sotware construction in higher-level languages. The problem is that the semantics of higher-level languages usually
entangle these elements in a way that the mind cannot easily tease apart. Take for example C++'s object system - it
entangles data abstraction, categories, generic programming, and resource semantics. To attempt to discuss each of
these elements in a clear way is made very difficult in the context of such complex object systems.
But when we use C to encode these elements, its weak language semantics actually enable us to understand each in a
fairly isolated way. This turns out to be somewhat important.
I suppose there is a third reason as well. I'm a hobbyist language designer (having written a couple of scripting
languages for my own game engines (preview at the bottom here -
https://medium.com/@bryanedds/a-powerful-f-library-shows-how-s-expressions-can-beat-xml-and-json-20c34ba1db4b),
and I'd like to see somewhat write an actual successful successor to C other than C++ and ObjC. If all things
were equal, and if I had the choice, I'd just use C++ rather than C for my upcoming project. Yet I still think
there is room for an evolution of C that is based on this more Platonic rationale than the confused, object-based
rationale that brought us C++ and ObjC.
Basically, if I had my druthers, I'd start working on such a successor to C right away.
But, in life, like in C, we do not have our druthers, so we have to do the best with what we've got. And to do that,
I have constructed a hopefully coherent and comprehensives 'Platonic' rationale that can guide me and be build upon
when I find at the mercy of weak tooling and confounding implementation details. So long as I can keep this
'Platonic' rationale in mind, I hope to proceed on my next software project without being swallowed by the dark pit
of the inevitable complexities of software construction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment