monadplus/profiling_haskell.md

## profiling_haskell.md

      
    Raw
  

              profiling_haskell.md
            
          
    Profiling in Haskell

Do not get bogged down in microoptimizations before you've assessed any macro optimizations that are available. IO and the choice of algorithm  dominate any low level changes you may make. In the end you have to think hard about your code!
Before starting to optimize:

Is the -O2 flag on ?
Profile: which part of the code is the slow one.
Use the best algorithm in that part.
Optimize: implement it in the most efficient way.

Profiling

Manual costs centers is usually better and avoids profiling library dependencies.
Don't add cost centers to functions that should be inlined because SCC pragma forces no-inline.
Profiling with GHC

Manual here
# This will add SSC everywhere
# You will probably want to change it to manual and use {-# SCC "name" #-} <expression>

ghc -O2 -prof -fprof-auto -rtsopts Example.hs 
./Example +RTS -p -RTS
cat Example.prof
Profiling with Cabal


Don't forget -O2

Manual cost centers:
# Add {-# SCC <name> #-} manually to the functions you want to profile

cabal build --enable-profiling --ghc-options="-fno-prof-auto"
time cabal exec example -- +RTS -p -s -RTS # Produce project.prof and output rts statistics
Automatic cost centers (use with care):
cabal build --enable-profiling --ghc-options="-fprof-auto"
time cabal exec example -- +RTS -p -s -RTS
Recall that for multi-threading you will need:
cabal build --enable-profiling --ghc-options="-threaded -fprof-auto"
time cabal exec example -- +RTS -N -p -s -RTS
Profiling with Stack

Manual cost centers:
mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fno-prof-auto"
time .stack-bin/example +RTS -p
Automatic cost centers:
mkdir -p .stack-bin
stack clean
stack install --local-bin-path .stack-bin --profile --ghc-options="-fprof-auto"
time .stack-bin/example +RTS -p
Profiling with Nix

See example here
Dumping Core and STG


Always dump to a file: -ddump-to-file
Dump Core after optimizations: -ddump-simpl
You can also dump STG: -ddump-stg

In *.cabal:
flag dump
  manual: True
  default: True

library
  build-depends:
  ghc-options: -O2

  if flag(dump)
    ghc-options: -ddump-simpl -ddump-stg -ddump-to-file

Spaceleak detection

Read:

Spaceleak Stack-limiting Technique
https://kodimensional.dev/space-leak


For example, if i see that a particular pure function is taking a  long time relative to the rest of the code, and that it's Text, and I'm seeing ARR_WORDS rise linearly in the heap, I probably have a thunk-based memory leak. This is knowledge you build up over time.

Tools

When you need to profile cpu usage:

profiteur
profiterole
ghc-prof-flamegraph

For thread profiling:

threadscope
ghc-events-analyze

When you need to profile memory usage:

eventlog2html

When you need to benchmark your application:

criterion
gauge
inspection-testing
tasty-bench

Getting the tools

To get an environment with all profiling tools:
$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: with pkgs; [ criterion deepseq parallel ])' haskellPackages.profiteur haskellPackages.threadscope haskellPackages.eventlog2html haskellPackages.ghc-prof-flamegraph
Using the tools

All examples are based on this program:
hellofib.hs
import Control.Parallel.Strategies
import System.Environment

fib 0 = 1
fib 1 = 1
fib n = runEval $ do
 x <- rpar (fib (n-1))
 y <- rseq (fib (n-2))
 return (x + y + 1)

main = do
 args <- getArgs
 n <- case args of
       []    -> return 20
       [x]   -> return (read x)
       _     -> fail ("Usage: hellofib [n]")
 print (fib n)
profiteur

$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ profiteur hellofib.prof
$ firefox hellofib.prof.html
ghc-prof-flamegraph

$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ ghc-prof-flamegraph hellofib.prof > output.svg
$ firefox output.svg
eventlog2html

Heap profiling rts options
$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
# Use -hc to know where the thunk is being created.
# Use -hd or -hy to know which data constructor/type is creating the thunk.
# Use -hr to know why your data is not being garbage collected (retained).
$ ./hellofib +RTS -N -hy -l # -l-agu to not include thread events
$ eventlog2html hellofib.eventlog
$ firefox hellofib.eventlog.html
cabal build --enable-profiling --ghc-options="-fprof-auto"
cabal exec example -- +RTS -hc -l -RTS

For some reason, if you manually add the cost centers and use -f-no-prof-auto the graph is empty.

There is a new flag -hi for profiling which gives you detailed information where the thunks (unevaluated closures) are accumulating:
$ ghc -eventlog -rtsopts -O2 -finfo-table-map -fdistinct-constructor-tables LargeThunk
$ ./LargeThunk 100000 100000 30000000 +RTS -l -hi -i0.5 -RTS
$ eventlog2html LargeThunk.eventlog
More on the blog post: https://well-typed.com/blog/2021/01/first-look-at-hi-profiling-mode/
Threadscope

Thread profiling and GC insight.
$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
$ ./hellofib +RTS -N -l -s
$ threadscope hellofib.eventlog
ghc-events-analyze

Threadscope shows CPU cores activity while ghc-events-analyze shows Haskell threads activity. ghc-events-analyze works for single concurrent programs. ghc-events-analyze allows to instrument regions of your code by named events.
Resources

Docs


Official docs
GHC(STG,Cmm,asm) illustrated
@_gilmi resources

Blogs


A First Look at Info Table Profiling
Detecting Space Leaks
Flame graphs for GHC time profiles with ghc-prof-flamegraph
FPComplete: Profiling and Performance
Haskell wiki: performance
Locating Performance Bottlenecks
Memory Fragmentation
Micro-optimizations
Performance profiling with ghc-events-analyze
Profiteur: a visualiser for Haskell GHC .prof files
Spaceleak Stack-limiting Technique: lots of interesting links about spaceleaks inside.
Top tips and tools for optimising Haskell
Stackoverflow: GHC's RTS options for garbage collection - Simon Marlow

Case Study


Beating C with 80 lines of Haskell: wc
Fast Haskell: Competing with C at parsing XML
Migrating text metrics to pure Haskell
On Competing with C Using Haskell
Sharing, Space Leaks, and Conduit and friends
A methodology to diagnose and solve performance problems in your Haskell programs

Books


Haskell High Performance Programming
Parallel and Concurrent Programming in Haskell

Videos


Introduction to Low Level Haskell Optimization
Low-level Haskell: An Interactive Tour Through the STG