Skip to content

Instantly share code, notes, and snippets.

@monadplus
Last active October 31, 2023 23:28
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save monadplus/627e36f2ea5b3cd2b5ea8021107328e6 to your computer and use it in GitHub Desktop.
Save monadplus/627e36f2ea5b3cd2b5ea8021107328e6 to your computer and use it in GitHub Desktop.
Haskell: Profiling

Profiling in Haskell

Do not get bogged down in microoptimizations before you've assessed any macro optimizations that are available. IO and the choice of algorithm dominate any low level changes you may make. In the end you have to think hard about your code!

Before starting to optimize:

  1. Is the -O2 flag on ?
  2. Profile: which part of the code is the slow one.
  3. Use the best algorithm in that part.
  4. Optimize: implement it in the most efficient way.

Profiling

Manual costs centers is usually better and avoids profiling library dependencies. Don't add cost centers to functions that should be inlined because SCC pragma forces no-inline.

Profiling with GHC

Manual here

# This will add SSC everywhere
# You will probably want to change it to manual and use {-# SCC "name" #-} <expression>

ghc -O2 -prof -fprof-auto -rtsopts Example.hs 
./Example +RTS -p -RTS
cat Example.prof

Profiling with Cabal

Don't forget -O2

Manual cost centers:

# Add {-# SCC <name> #-} manually to the functions you want to profile

cabal build --enable-profiling --ghc-options="-fno-prof-auto"
time cabal exec example -- +RTS -p -s -RTS # Produce project.prof and output rts statistics

Automatic cost centers (use with care):

cabal build --enable-profiling --ghc-options="-fprof-auto"
time cabal exec example -- +RTS -p -s -RTS

Recall that for multi-threading you will need:

cabal build --enable-profiling --ghc-options="-threaded -fprof-auto"
time cabal exec example -- +RTS -N -p -s -RTS

Profiling with Stack

Manual cost centers:

mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fno-prof-auto"
time .stack-bin/example +RTS -p

Automatic cost centers:

mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fprof-auto"
time .stack-bin/example +RTS -p

Profiling with Nix

See example here

Dumping Core and STG

  • Always dump to a file: -ddump-to-file
  • Dump Core after optimizations: -ddump-simpl
  • You can also dump STG: -ddump-stg

In *.cabal:

flag dump
  manual: True
  default: True

library
  build-depends:
  ghc-options: -O2

  if flag(dump)
    ghc-options: -ddump-simpl -ddump-stg -ddump-to-file

Spaceleak detection

Read:

For example, if i see that a particular pure function is taking a long time relative to the rest of the code, and that it's Text, and I'm seeing ARR_WORDS rise linearly in the heap, I probably have a thunk-based memory leak. This is knowledge you build up over time.

Tools

When you need to profile cpu usage:

For thread profiling:

When you need to profile memory usage:

When you need to benchmark your application:

Getting the tools

To get an environment with all profiling tools:

$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: with pkgs; [ criterion deepseq parallel ])' haskellPackages.profiteur haskellPackages.threadscope haskellPackages.eventlog2html haskellPackages.ghc-prof-flamegraph

Using the tools

All examples are based on this program:

hellofib.hs

import Control.Parallel.Strategies
import System.Environment

fib 0 = 1
fib 1 = 1
fib n = runEval $ do
 x <- rpar (fib (n-1))
 y <- rseq (fib (n-2))
 return (x + y + 1)

main = do
 args <- getArgs
 n <- case args of
       []    -> return 20
       [x]   -> return (read x)
       _     -> fail ("Usage: hellofib [n]")
 print (fib n)

profiteur

$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ profiteur hellofib.prof
$ firefox hellofib.prof.html

ghc-prof-flamegraph

$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ ghc-prof-flamegraph hellofib.prof > output.svg
$ firefox output.svg

eventlog2html

Heap profiling rts options

$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
# Use -hc to know where the thunk is being created.
# Use -hd or -hy to know which data constructor/type is creating the thunk.
# Use -hr to know why your data is not being garbage collected (retained).
$ ./hellofib +RTS -N -hy -l # -l-agu to not include thread events
$ eventlog2html hellofib.eventlog
$ firefox hellofib.eventlog.html
cabal build --enable-profiling --ghc-options="-fprof-auto"
cabal exec example -- +RTS -hc -l -RTS

For some reason, if you manually add the cost centers and use -f-no-prof-auto the graph is empty.

There is a new flag -hi for profiling which gives you detailed information where the thunks (unevaluated closures) are accumulating:

$ ghc -eventlog -rtsopts -O2 -finfo-table-map -fdistinct-constructor-tables LargeThunk
$ ./LargeThunk 100000 100000 30000000 +RTS -l -hi -i0.5 -RTS
$ eventlog2html LargeThunk.eventlog

More on the blog post: https://well-typed.com/blog/2021/01/first-look-at-hi-profiling-mode/

Threadscope

Thread profiling and GC insight.

$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
$ ./hellofib +RTS -N -l -s
$ threadscope hellofib.eventlog

ghc-events-analyze

Threadscope shows CPU cores activity while ghc-events-analyze shows Haskell threads activity. ghc-events-analyze works for single concurrent programs. ghc-events-analyze allows to instrument regions of your code by named events.

Resources

Docs

Blogs

Case Study

Books

Videos

@monadplus
Copy link
Author

Tip

If you ever experience an exception which requires a stack trace to be debugged, use -xc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment