Skip to content

Instantly share code, notes, and snippets.

@robinp
Last active April 2, 2018 10:42
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robinp/49c68c6f69f6aabfddee3cba42b4964f to your computer and use it in GitHub Desktop.
Save robinp/49c68c6f69f6aabfddee3cba42b4964f to your computer and use it in GitHub Desktop.
Calling GHC API multiple times: choose your ExitFailure

GHC API lets you process Haskell sources, and (among other) specify code generation and linking level. For example:

  • (1) no codegen & no link
  • (2) bytecode generation & link in memory
  • (3) machine code generation & linking output binaries

For code analysis purposes, based on generating the typechecked AST, option (1) suffices most of the time. There are some situations in which it doesn't:

  • TemplateHaskell (TH) splice needs to execute code (at compile time) from an imported module: the imported module must be available in compiled form, so either (2) or (3) is needed. Example: in $([|$(foo)|]), foo will be evaluated at compile-time.
  • Code uses FFI imports. For this one would expect that (2) is needed (see checkCOrAsmOrLlvmOrInterp in TcForeign.hs), but actually unless it is used (say by TH, see below), even (1) works too (see the wrapper checkCg).
  • Code uses FFI exports. For this (1) or (3) is needed (see checkCOrAsmOrLlvm). Trying to use (2) will result in an error.
  • TH splice needs to execute FFI (foreign) code. For this (2) is needed at least, and the objects need to be linked actually.

To further complicate, (2) is not compatible with optimization (say -O2), so there either (1) or (3) is the way to go, based on above conditions. Well, actually it just emits a warning that optimization is turned off, but any deviation from true behavior bites eventually.

When objects come into play (for example through FFI), unavoidable complications arise when we want to perform multiple GHC API invocations, for example load and analyise many independent pieces of code during the run of a single analyser process. The runs are practically non-independent, since GHC's Runtime Linker (RTL) state is global (see Linker.hs).

There are some functions exported from Linker which let interaction with the RTL. But let's see the what error will we face if we don't use any of these functions. The following assumes (3), that is generating machine code + binary linking.

First, we analyse a compilation unit (a set of files + corresponding flags) that triggers initDynLinker for some reason. Then, we analyse an other compilation unit, which actually needs to link a plain object dep.o which contains a FFI function implementation. So initDynLinker is invoked again, but since it was already inited before, no real work is done, and the object doesn't get added to the linker state. GHC bails out with missing symbol myFfiImplementedFunction.

Let's try to fix by replicating initDynLinker's behavior ourselves, and call loadCmdLineLibs. Profit! Our second compilation now succeeds. But let's say we attempt a third (independent) compilation, which also happens to pull in an object that defines myFfiImplementedFunction. Now we get GHC runtime linker: fatal error: I found a duplicate definition for symbol ....

In our final hope we reach to unload. Indeed, it unloads various things, as inspected by showLinkerState, but it seems not the (non-Haskell) objects.

Bottom line is, if we want near-100% replicating pure GHC compilation, we can't call GHC API repeatedly from the same process due to global linker state. Then the Frontend Plugin way is a good alternative, since it executes a new binary for each compilation, and also correctly performs a lot of GHC magic that only happens in GHC's Main.hs. But the Frontend Plugin also doesn't prepare everything, for example invoking compileFile on non-Haskell sources is still a task to do manually (inspect GHC Main's code to find out these details).

Or, we can hope for the best, disable machine code generation and linking when not needed, and live with eventual failures due to the global state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment