In this gist, we consider ways of returning strings to the caller of a C function. I am writing this from the perspective of someone who is creating a C library, and looking for a flexible, easy, efficient, and safe way of returning strings to the callers of the library. This is tricky because strings generally have a variable length, so you can't just return them by value. Decisions need to be made about who is responsible for allocating and and freeing the string.
error_type get_string1(char * buffer, size_t buffer_size, size_t * required_size)
If the buffer pointer is NULL, you might consider this call to just be a call to determine the size, and don't return an error code. If there is a buffer and it is too small, return an error code.
Suppose that the caller does not know a good upper bound on the possible sizes of the string. Then the caller needs two separate calls to this method in order to guarantee getting a string. If the string is expensive to fetch and it is not cached, having those calls could cause headaches. If the string has the potential to change size between those two calls, that could cause headaches too.
Exponential explosion: Because the caller needs to call the function twice, this could lead to an exponential explosion of function calls when such functions are composed. For example, suppose you have a series of functions
fn, each returning a string using option 1. Suppose
f0 does an expensive calculation and returns a string using option 1, and the other functions
fn each call the previous function in the list to get the string, perform some arbitary operation on it, and then return it using option 1. Each function must call the previous function twice because they don't know an upper bound on the size of the string. Therefore,
fn is called twice,
f(n-1) is called four times,
f(n-2) is called eight times, and
f0 is called 2^(n-1) times! This is an exponential explosion.
- Leave off the
error_type get_string2(const char ** string);
Give the caller a pointer which they can look at, but they don't own it and it might become invalid at some point. Only one function call will ever be required, but it imposes limitations on the implementation details of that function, because the library itself needs to hang on to the string and free it later. This increases the risk that the user will do something bad like modifying the string or using it beyond its lifetime.
error_type get_string3(char ** string); void free_string3(char *);
Allocate a string of the appropriate size for the user and tell them to free it later by calling a separate function. This will increase the number of functions in your API and the compiler cannot check that the right type of string is actually passed to the freeing function. Users will forget to free the string or call free with invalid arguments.
- Instead of having a different free function for each type of string returned by your library, just have one, called something like
error_type get_string4(string_wrapper ** string); const char * get_wrapped_string(const string_wrapper *); void free_string_wrapper(string_wrapper *);
Similar to option 3, but it fixes the concerns about adding tons of extra functions to the library and about the compiler not being able to check the types of the strings we are freeing. The
string_wrapper will probably be a typedef for a struct that simply contains a single
char * member. The user has to call a total of three functions whenever they want to get a string. It requires two extra functions for dealing with wrapped strings, but you only need two of them in your entire library. It requires you to define your own string type, which might provoke people to say "Why is this library re-inventing string types instead of using something that already exists?".