Skip to content

Instantly share code, notes, and snippets.

@jproyo jproyo/a.md forked from monadplus/a.md
Created Mar 8, 2020

Embed
What would you like to do?
Profiling in Haskell

You'll generally want to look at:

  • heap
  • stack
  • gc profiles

Do not get bogged down in microoptimizations before you've assessed any macro optimizations that are available. IO and the choice of algorithm dominate any low level changes you may make. In the end you have to think hard about your code!

Topos> For example, if i see that a particular pure function is taking a  long time relative to the rest of the code, and that it's Text, and I'm seeing ARR_WORDS rise linearly in the heap, I probably have a thunk-based memory leak. This is knowledge you build up over time.

Tools

Those are pretty useful when you need to figure out the cost centers (the hot path):

Then you need to figure out what is going-on on the heap:

Finally, you need to benchmark your code and try to apply optimizations:

Having a look at Core may help.

Resources

Docs

Blogs

Youtube

Profiling with GHC

Manual here

$ ghc -O2 -prof -fprof-auto -rtsopts Example.hs # This will add SSC everywhere
                                                # You will probably want to change it to manual and use {-# SCC "name" #-} <expression>
$ ./Example +RTS -p
$ vim Example.prof

Profiling with Stack

$ stack clean # not always needed
$ stack build --profile
# -p time
# -h memory
$ stack exec --profile -- <your program> +RTS -p -h

Profiling with Cabal

cabal v1-configure --enable-library-profiling --enable-executable-profiling --enable-tests --enable-benchmarks

*.cabal

executable my-project-profiling
  main-is:
    Profiling.hs
  ghc-options:
    -O2
    -threaded
    -fprof-auto
    "-with-rtsopts=-N -p -s"
  build-depends:
    base >= 4.5 && < 5

Install the dependencies with support for profiling

$ cabal install --only-dependencies --enable-library-profiling

# Alternative, add this to cabal.config
library-profiling: True

$ cabal install --only-dependencies

Running the program

# Will produce project.prof and output rts statistics
cabal run <project>

Tools

To get an environment with all profiling tools:

$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: with pkgs; [ criterion deepseq parallel ])' haskellPackages.profiteur haskellPackages.threadscope haskellPackages.eventlog2html haskellPackages.prof-flamegraph flameGraph

All examples are based on this program:

hellofib.hs

import Control.Parallel.Strategies
import System.Environment

fib 0 = 1
fib 1 = 1
fib n = runEval $ do
 x <- rpar (fib (n-1))
 y <- rseq (fib (n-2))
 return (x + y + 1)

main = do
 args <- getArgs
 n <- case args of
       []    -> return 20
       [x]   -> return (read x)
       _     -> fail ("Usage: hellofib [n]")
 print (fib n)

threadscope

$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
$ ./hellofib +RTS -N -l -s
$ threadscope hellofib.eventlog

eventlog2html

# Doesn't work.
$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
$ ./hellofib +RTS -hy -N -l
$ eventlog2html hellofib.eventlog
$ firefox hellofib.eventlog.html
profiteur
$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ profiteur hellofib.prof
$ firefox hellofib.prof.html
prof-flamegraph + flameGraph
$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ cat hellofib.prof | prof-flamegraph | flamegraph.pl > output.svg
$ firefox output.svg

Dump Core Code

$ ghc -O2 -ddump-simpl -dsuppress-all -fforce-recomp Example.hs > Example.core
$ vim Example.core

Introduction to Low Level Haskell Optimization

How to Optimize

Source: https://www.youtube.com/watch?v=McFNkLPTOSY

Before starting to optimize:

  1. Is the -O2 flag on ?
  2. Profile: which part of the code is the slow one.
  3. Use the best algorithm in that part.
  4. Optimize: implement it in the most efficient way.

Optimizing

Add always the flag: -ddump-to-file

  • Dump Core after optimizations: -ddump-simpl
  • You can also dump STG: -ddump-stg

In *.cabal:

flag dump
  manual: True
  default: True

library
  build-depends:
  ghc-options: -O2

  if flag(dump)
    ghc-options: -ddump-simpl -ddump-stg -ddump-to-file

Spaceleak stack-limiting technique

All credit to Neil Mitchel and his repo.

The stack-limiting technique is useful for detecting a common kind of spaceleak, a situation where an excessive number of interdependent thunks are created and eventually evaluated. The technique works because thunks are evaluated on the stack, so by limiting the stack size we effectively limit the number of interdependent thunks that we can evaluate.

To find space leaks, given a program and a representative run (e.g. the test suite, a suitable input file):

  1. Compile the program for profiling, e.g. ghc --make Main.hs -rtsopts -prof -auto-all.
  2. Run the program with a specific stack size, e.g. ./Main +RTS -K100K to run with a 100Kb stack.
  3. Increase/decrease the stack size until you have determined the minimum stack for which the program succeeds, e.g. -K33K.
  4. Reduce the stack by a small amount and rerun with -xc, e.g. ./Main +RTS -K32K -xc.
  5. The -xc run will print out the stack trace on every exception, look for the one which says stack overflow (likely the last one) and look at the stack trace to determine roughly where the leak is.
  6. Attempt to fix the space leak, confirm by rerunning with -K32K.
  7. Repeat until the test works with a small stack, typically -K1K.
  8. Add something to your test suite to ensure that if a space leak is ever introduced then it fails, e.g. ghc-options: -with-rtsopts=-K1K in Cabal.

This technique does not detect when an excessive number of thunks are created but never evaluated, or when a small number of thunks hold on to a large amount of live data.

Time and Allocation Profiling

Cost centres and cost-centre stacks

Automatically inserting cost centers:

$ ghc -prof -fprof-auto -rtsopts Main.hs
$ ./Main +RTS -p

Compiler options for profiling

# All modules must be compiled and liked with the -prof option:
# Otherwise SCC are ignored
-prof 

-fprof-auto
-fprof-auto-top

# Less used
-fprof-auto-exported
-fprof-auto-calls
-fprof-cafs
-fno-prof-auto
-fno-prof-cafs
-fno-prof-count-entries¶

+RTS options

  • The -p option produces a standard time profile report. It is written into the file .prof; the stem is taken to be the program name by default, but can be overridden by the -po ⟨stem⟩ flag.

  • The -P option produces a more detailed report containing the actual time and allocation data as well. (Not used much.)

  • The -pa option produces the most detailed report containing all cost centres in addition to the actual time and allocation data.

More at https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#time-and-allocation-profiling

Inserting cost centres by hand

{-# SCC "name" #-}

f x y = ...
  where
    g z = ...
    {-# SCC g #-}

{-# SCC f #-}

Using a different name:

f x y = ...
{-# SCC f "cost_centre_name" #-}

Example of program:

main :: IO ()
main = do let xs = [1..1000000]
          let ys = [1..2000000]
          print $ {-# SCC last_xs #-} last xs
          print $ {-# SCC last_init_xs #-} last $ init xs
          print $ {-# SCC last_ys #-} last ys
          print $ {-# SCC last_init_ys #-} last $ init ys

Profiling memory usage

Detecting the causes of space leaks.

Space leaks lead to slower execution due to heavy garbage collector activity, and may even cause the program to run out of memory altogether.

$ ghc -O2 -prof -fprof-auto -rtsopts -threaded -eventlog <program>

$ ./<program> +RTS -N -h 

# Use hp2ps to analyze your program.hp
# Use threadscope or eventlog2html to analyze your program.eventlog

hp2ps – Rendering heap profiles to PostScript

$ hp2ps <program-name>.hp
$ gv <program-name>.ps

It is possible to view the heap profile of a running program: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#viewing-the-heap-profile-of-a-running-program

RTS options

Take a look at the manual: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#rts-options-for-heap-profiling

Profiling Parallel and Concurrent Programs

Use: -prof -threaded

Use: +RTS -N

Profiling doesn't work well on a parallel program:

  • The overhead is like ~30%
    • Helps a bit: -fno-prof-count-entries
  • There are race conditions on the cost-centers statistics.

Use ThreadScope

Retainer Profiling

Retainer profiling is designed to help answer questions like “why is this data being retained?”

Biographical Profiling

A typical heap object may be in one of the following four states at each point in its lifetime:

  • The lag stage, which is the time between creation and the first use of the object,
  • the use stage, which lasts from the first use until the last use of the object, and
  • The drag stage, which lasts from the final use until the last reference to the object is dropped.
  • An object which is never used is said to be in the void state for its whole lifetime.

A biographical heap profile displays the portion of the live heap in each of the four states listed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.