Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Recently I've been working on a cross-platform program. This program depends on LLVM/Clang, and currently targets Windows and Linux. Furthermore, the program runs both as a console application and library- for example, to support a VS addin. I wanted to share some of my stories depending on these libraries, and present some don'ts of creating a library. Some of them are more subtle than others.
Mutate global state of any form. This includes direct interaction with stdin/stdout/stderr, environment variables, filesystem, the works. It's one thing to provide helpers to make interaction with them simple, and another to simply interact with them regardless. LLVM keeps global mutable lists of things like loaded DLLs. Both libraries have a nasty habit of dumping things to stdout. Running tests on Linux produces 2MB of gunk that makes the real output unreadable. And if you're running as a VS addin, nobody is listening to stdout, so it's all wasted even if you wanted to read it.
Keep private state. Seriously? Yes. Here's the thing. Private state exists to decrease coupling. But sometimes there are things worse than coupling- for example, being totally unable to implement a feature. Keeping private state means that your users can't make this decision for themselves, and if coupling is the lesser of two evils, having to employ nasty techniques. This of course does not imply that everything should be directly public. Currently, Boost generally makes a fairly good tradeoff, which is to say, everything private is in details folder and/or namespace, and use it at your own risk.
This also directly implies that you should not use TU-static functions generally, or have implementation-only headers.
Just as a note, this is how things are done in Python. I believe their exact naming scheme is reserved for the implementation in C++, so that would be bad, but you could create another or opt for various other strategies that allow limited access. It may not be worth the time to go around ensuring backdoors for absolutely everything, but nor is it worth unnecessarily restricting the user.
Assert or do any process termination. There's nothing more frustrating than terminating VS or the whole addin. There's no reason for the user to lose their parse trees, for example, just because my semantic analyzer has a bug that produces bad LLVM IR. Asserts are not error handling. Only the final application developer knows how to handle errors.
Tie yourself to a specific mode of operation. One really simple example of this is that Clang cannot share information between translation units. This makes sense- if you're only thinking about how it's invoked as a process one TU at a time by something like make. It does not make any sense at all when you're invoking it as a VS addin and the user might switch to another TU at any time or request operations that need data from multiple TUs. Direct ties to stdin/stdout/stderr are another example of how you can tie yourself to a specific application form factor. Yet another example is that some functionality in Clang, like finding GCC header includes, is only performed by their console driver and can't be requested externally.
Build in a restricted form by default. A really simple example of this is LLVM building without RTTI. This means that, by default, LLVM cannot link to programs that use certain APIs that require inheriting from their base classes, where that program uses RTTI. This is because their base classes don't have a typeinfo as they compile without RTTI by default. The better default is to build with RTTI and leave the binary size reductions for those who want it.
Require dependencies that you only need for some subset of functionality by default. LLVM depends on ncurses, which is required for their coloured console output. This is, of course, totally useless if you're building LLVM as a library to use in an IDE addin, or you perform all I/O yourself. This is not to say that you should not have dependencies, but if you require something by default, it should not be for things that not all major use cases need.
Require source code changes to extend your library if at all possible. If the user has to start digging through your source, that is not extensible. Extensible library means that the user does not have to understand your implementation to make extensions. A simple example of how to violate this is to offer an interface, but then in your implementation, dynamic_cast through every implementation you provide and fail if the derived class is not in the list. Again, this is a tradeoff between extensibility and other concerns when you decide if or how a particular piece can be extended. But ultimately, when it comes to implement a new feature, you and the user will have the same interests- the harder it is to extend for a user, the harder it is to extend for you.
Ultimately, what this all really boils down to is that when you're writing a library, you have to remember that the developer using the API you provide is in the driving seat. They make most of the decisions and you should limit yourself to your domain as strictly as possible. There are lots of ways that it turns out that you can make life hard for a dependent developer without really thinking about it.
Just as a footnote, I don't mean to piss on Clang and LLVM. Without their free hard work, my program would be impossible. That doesn't mean I can't wish they were better or that I had the time to do some work on them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.