This is a post is an FAQ answering the most common questions people ask me related to inlining and specialization. I've also structured it as a blog post that you can read from top to bottom.
"Inlining" means a compiler substituting a function call or a variable with its definition when compiling code. A really simple example of inlining is if you write code like this:
module Example where
x :: Int
x = 5
y :: Int
y = x + 1
… then at copmile time the Haskell compiler can (and will) substitute the last occurrence of x
with its definition (i.e. 5
):
y :: Int
y = 5 + 1
… which then allows the compiler to further simplify the code to:
y :: Int
y = 6
In fact, we can verify that for ourselves by having the compiler dump its intermediate "core" representation like this:
$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs
… which will produce this output:
==================== Tidy Core ====================
Result size of Tidy Core
= {terms: 20, types: 7, coercions: 0, joins: 0/0}
x = I# 5#
$trModule4 = "main"#
$trModule3 = TrNameS $trModule4
$trModule2 = "Example"#
$trModule1 = TrNameS $trModule2
$trModule = Module $trModule3 $trModule1
y = I# 6#
… which we can squint a little bit and read it as:
x = 5
y = 6
… and ignore the other stuff.
A slightly more interesting example of inlining is a function call, like this one:
f :: Int -> Int
f x = x + 1
y :: Int
y = f 5
The compiler will be smart enough to inline f
by replacing f 5
with 5 + 1
(here x
is 5
):
y :: Int
y = 5 + 1
… and just like before the compiler will simplify that further to y = 6
, which we can verify from the core output:
y = I# 6#
"Specialization" means replacing a "polymorphic" function with a "monomorphic" function. A "polymorphic" function is a function whose type has a type variable, like this one:
-- Here `f` is our type variable
example :: Functor f => f Int -> f Int
example = fmap (+ 1)
… and a "monomorphic" version of the same function replaces the type variable with a specific (concrete) type or type constructor:
example2 :: Maybe Int -> Maybe Int
example2 = fmap (+ 1)
Notice that example
and example2
are defined in the same way, but they are not exactly the same function:
-
example
is more flexible and works on strictly more type constructorsexample
works on any type constructorf
that implementsFunctor
, whereasexample2
only works on theMaybe
type constructor (which implementsFunctor
). -
example
andexample2
compile to very different core representations
In fact, they don't even have the same "shape" as far as GHC's core representation is concerned. Under the hood, the example
function takes two extra "hidden" function arguments compared to example2
, which we can see if you dump the core output (and I've tidied up the output a lot for clarity):
example @f $Functor = fmap $Functor (\v -> v + 1)
example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)
The two extra function arguments are:
-
@f
: This represents the type variablef
Yes, the type variable that shows up in the type signature also shows up at the term level in the GHC core representation. If you want to learn more about this you might be interested in my Polymorphism for Dummies post.
-
$Functor
: This represents theFunctor
instance forf
Yes, the
Functor
instance for a type likef
is actually a first-class value passed around within the GHC core representation. If you want to learn more about this you might be interested in my Scrap your Typeclasses post.
Notice how the compiler cannot optimize example
as well as it can optimize example2
because the compiler doesn't (yet) know which type constructor f
we're going to call example
on and also doesn't (yet) know which Functor f
instance we're going to use. However, once the compiler does know which type constructor we're using it can optimize a lot more.
In fact, we can see this for ourselves by changing our code a little bit to simply define example2
in terms of example
:
example :: Functor f => f Int -> f Int
example = fmap (+ 1)
example2 :: Maybe Int -> Maybe Int
example2 = example
This compiles to the exact same code as before (you can check for yourself if you don't believe me).
Here we would say that example2
is "example
specialized to the Maybe
type constructor". When write something like this:
example2 :: Maybe Int -> Maybe Int
example2 = example
… what's actually happening under the hood is that the compiler is actually doing something like this:
example2 = example @Maybe $FunctorMaybe
In other words, the compiler is taking the more general example
function (which works on any type constructor f
and any Functor f
instance) and then "applying" it to a specific type constructor (@Maybe
) and the corresponding Functor
instance ($FunctorMaybe
).
In fact, we can see this for ourselves if we generate core output with optimization disabled (-O0
instead of -O2
) and if we remove the -dsuppress-all
flag:
$ ghc -O0 -fforce-recomp -ddump-simpl Example.hs
This outputs (among other things):
…
example2 = example @Maybe $FunctorMaybe
…
And when we enable optimizations (with -O2
):
$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs
… then GHC inlines the definition of example
and simplifies things further, which is how it generates this much more optimized core representation for example2
:
example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)
In fact, specialization is essentially the same thing as inlining under the hood (I'm oversimplifying a bit, but they are morally the same thing). The main distinction between inlining and specialization is:
-
specialization simplifies function calls with "type-level" arguments
By "type-level" arguments I mean (hidden) function arguments that are types, type constructors, and type class instances
-
inlining simplifies function calls with "term-level" arguments
By "term-level" arguments I mean the "ordinary" (visible) function arguments you know and love
NO. GHC does not always inline or specialize code, for two main reasons:
-
Inlining is not always an optimization
Inlining can sometimes make code slower. In particular, it can often be better to not inline a function with a large implementation because then the corresponding CPU instructions can be cached.
-
Inlining a function requires access to the function's source code
In particular, if the function is defined in a different module from where the function is used (a.k.a. the "call site") then the call site does not necessarily have access to the function's source code.
To expand on the latter point, Haskell modules are compiled separately (in other words, each module is a separate "compilation unit"), and the compiler generates two outputs when compiling a module:
-
a
.o
file containing object code (e.g.Example.o
)This object code is what is linked into the final executable to generate a runnable program.
-
a
.hi
file containing (among other things) source codeThe compiler can optionally store the source code for any compiled functions inside this
.hi
file so that it can inline those functions when compiling other modules.
However, the compiler does not always save the source code for all functions that it compile because there are downsides to storing source code for functions:
-
this slows down compilation
This slows down compilation both for the "upstream" module (the module defining the function we might want to inline) and the "downstream" module (the module calling the function we might want to inline). The upstream module takes longer to compile because now the full body of the function needs to be saved in the
.hi
file and the downstream module takes longer to compile because inlining isn't free (all optimizations, including inlining, generate more work for the compiler). -
this makes the
.hi
file biggerThe
.hi
file gets bigger because it's storing the source code of the function. -
this can also make the object code larger, too
Inlining a function multiple times can lead to duplicating the corresponding object code for that function.
This is why by default the compiler uses its own heuristic to decide which functions are worth storing in the .hi
file. The compiler does not indiscriminately save the source code for all functions.
You can override the compiler's heuristic, though, using …
There are a few compiler directives (a.k.a. "pragmas") related to inlining and specialization that we'll cover here:
INLINABLE
INLINE
NOINLINE
SPECIALIZE
My general rule of thumb for these compiler directives is:
- don't use any compiler directive until you benchmark your code to show that it helps
- if you do use a compiler directive,
INLINABLE
is probably the one you should pick
I'll still explain what what all the compiler directives mean, though.
INLINABLE
is a compiler directive that you use like this:
f :: Int -> Int
f x = x + 1
{-# INLINABLE f #-}
The INLINABLE
directive tells the compiler to save the function's source code in the .hi
file in order to make that function available for inlining downstream. HOWEVER, INLINABLE
does NOT force the compiler to inline that function. The compiler will still use its own judgment to decide whether or not the function should be inlined (and the compiler's judgment tends to be fairly good).
INLINE
is a compiler directive that you use in a similar manner as INLINABLE
:
f :: Int -> Int
f x = x + 1
{-# INLINE f #-}
INLINE
behaves like INLINABLE
except that it also heavily biases the compiler in favor of inlining the function. There are still some cases where the compiler will refuse to fully inline the function (for example, if the function is recursive), but generally speaking the INLINE
directive override's the compiler's own judgment for whether or not to inline the function.
I would argue that you usually should prefer the INLINABLE
pragma over the INLINE
pragma because the compiler's judgment for whether or not to inline things is usually good. If you override the compiler's judgment there's a good chance you're making things worse unless you have benchmarks showing otherwise.
If you mark a function as NOINLINE
:
f :: Int -> Int
f x = x + 1
{-# NOINLINE f #-}
… then the compiler will refuse to inline that function. It's pretty rare to see people use a NOINLINE
annotation for performance reasons (although there are circumstances where NOINLINE
can be an optimization). It's far, far, far more common to see people use NOINLINE
in conjunction with unsafePerformIO
because that's what the unsafePerformIO
documentation recommends:
Use
{-# NOINLINE foo #-}
as a pragma on any functionfoo
that calls unsafePerformIO. If the call is inlined, the I/O may be performed more than once.
SPECIALIZE
lets you hint to the compiler that it should compile a polymorphic function for a monomorphic type ahead of time. For example, if we define a polymorphic function like this:
example :: Functor f => f Int -> f Int
example = fmap (+ 1)
… we can tell the compiler to go ahead and specialize the example
function for the special case where f
is Maybe
, like this:
example :: Functor f => f Int -> f Int
example = fmap (+ 1)
{-# SPECIALIZE example :: Maybe Int -> Maybe Int #-}
This tells the compiler to go ahead and compile the more specialized version, too, because we expect some other module to use that more specialized version. This is nice if we want to get the benefits of specialization without exporting the function's source code (so we don't bloat the .hi
file) or if we want more precise control over when specialize does and does not happen.
In practice, though, I find that most Haskell programmers don't want to go to the trouble of anticipating and declaring all possible specializations, which is why I endorse INLINABLE
as the more ergonomic alternative to SPECIALIZE
.