jproyo/a.md

## a.md

      
    Raw
  

              a.md
            
          
    You'll generally want to look at:

heap
stack
gc profiles

Do not get bogged down in microoptimizations before  you've assessed any macro optimizations that are available. IO and the choice of algorithm  dominate any low level changes you may make. In the end you have to think hard about your code!
Topos> For example, if i see that a particular pure function is taking a  long time relative to the rest of the code, and that it's Text, and I'm seeing ARR_WORDS rise linearly in the heap, I probably have a thunk-based memory leak. This is knowledge you build up over time.

Tools

Those are pretty useful when you need to figure out the cost centers (the hot path):

profiteur
prof-flamegrpah + flameGraph

Then you need to figure out what is going-on on the heap:

threadscope:
time
eventlog2html: heap

Finally, you need to benchmark your code and try to apply optimizations:

criterion: benchmarking (nice statistics)

Having a look at Core may help.
Resources

Docs


Offical docs

Blogs


Locating Performance Bottlenecks
Micro-optimizations
Detecting Space Leaks
Beating C with 80 lines of Haskell: wc

Youtube


Introduction to Low Level Haskell Optimization


## b.md

      
    Raw
  

              b.md
            
          
    Profiling with GHC

Manual here
$ ghc -O2 -prof -fprof-auto -rtsopts Example.hs # This will add SSC everywhere
                                                # You will probably want to change it to manual and use {-# SCC "name" #-} <expression>
$ ./Example +RTS -p
$ vim Example.prof
Profiling with Stack

$ stack clean # not always needed
$ stack build --profile
# -p time
# -h memory
$ stack exec --profile -- <your program> +RTS -p -h
Profiling with Cabal

cabal v1-configure --enable-library-profiling --enable-executable-profiling --enable-tests --enable-benchmarks
*.cabal
executable my-project-profiling
  main-is:
    Profiling.hs
  ghc-options:
    -O2
    -threaded
    -fprof-auto
    "-with-rtsopts=-N -p -s"
  build-depends:
    base >= 4.5 && < 5

Install the dependencies with support for profiling
$ cabal install --only-dependencies --enable-library-profiling

# Alternative, add this to cabal.config
library-profiling: True

$ cabal install --only-dependencies
Running the program
# Will produce project.prof and output rts statistics
cabal run <project>

  
## c.md

      
    Raw
  

              c.md
            
          
    Tools

To get an environment with all profiling tools:
$ nix-shell --packages 'haskellPackages.ghcWithHoogle (pkgs: with pkgs; [ criterion deepseq parallel ])' haskellPackages.profiteur haskellPackages.threadscope haskellPackages.eventlog2html haskellPackages.prof-flamegraph flameGraph
All examples are based on this program:
hellofib.hs
import Control.Parallel.Strategies
import System.Environment

fib 0 = 1
fib 1 = 1
fib n = runEval $ do
 x <- rpar (fib (n-1))
 y <- rseq (fib (n-2))
 return (x + y + 1)

main = do
 args <- getArgs
 n <- case args of
       []    -> return 20
       [x]   -> return (read x)
       _     -> fail ("Usage: hellofib [n]")
 print (fib n)
threadscope

$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
$ ./hellofib +RTS -N -l -s
$ threadscope hellofib.eventlog
eventlog2html

# Doesn't work.
$ ghc -O2 -rtsopts -threaded -prof -fprof-auto -eventlog hellofib
$ ./hellofib +RTS -hy -N -l
$ eventlog2html hellofib.eventlog
$ firefox hellofib.eventlog.html
profiteur

$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ profiteur hellofib.prof
$ firefox hellofib.prof.html
prof-flamegraph + flameGraph

$ ghc -O2 -prof -fprof-auto -rtsopts -threaded hellofib
$ ./hellofib +RTS -N -pa
$ cat hellofib.prof | prof-flamegraph | flamegraph.pl > output.svg
$ firefox output.svg

  
## d.md

      
    Raw
  

              d.md
            
          
    Dump Core Code

$ ghc -O2 -ddump-simpl -dsuppress-all -fforce-recomp Example.hs > Example.core
$ vim Example.core

  
## e.md

      
    Raw
  

              e.md
            
          
    Introduction to Low Level Haskell Optimization

How to Optimize

Source: https://www.youtube.com/watch?v=McFNkLPTOSY
Before starting to optimize:

Is the -O2 flag on ?
Profile: which part of the code is the slow one.
Use the best algorithm in that part.
Optimize: implement it in the most efficient way.

Optimizing

Add always the flag: -ddump-to-file

Dump Core after optimizations: -ddump-simpl
You can also dump STG: -ddump-stg

In *.cabal:
flag dump
  manual: True
  default: True

library
  build-depends:
  ghc-options: -O2

  if flag(dump)
    ghc-options: -ddump-simpl -ddump-stg -ddump-to-file


## ee.md

      
    Raw
  

              ee.md
            
          
    Spaceleak stack-limiting technique

All credit to Neil Mitchel and his repo.
The stack-limiting technique is useful for detecting a common kind of spaceleak, a situation where an excessive number of interdependent thunks are created and eventually evaluated. The technique works because thunks are evaluated on the stack, so by limiting the stack size we effectively limit the number of interdependent thunks that we can evaluate.
To find space leaks, given a program and a representative run (e.g. the test suite, a suitable input file):

Compile the program for profiling, e.g. ghc --make Main.hs -rtsopts -prof -auto-all.
Run the program with a specific stack size, e.g. ./Main +RTS -K100K to run with a 100Kb stack.
Increase/decrease the stack size until you have determined the minimum stack for which the program succeeds, e.g. -K33K.
Reduce the stack by a small amount and rerun with -xc, e.g. ./Main +RTS -K32K -xc.
The -xc run will print out the stack trace on every exception, look for the one which says stack overflow (likely the last one) and look at the stack trace to determine roughly where the leak is.
Attempt to fix the space leak, confirm by rerunning with -K32K.
Repeat until the test works with a small stack, typically -K1K.
Add something to your test suite to ensure that if a space leak is ever introduced then it fails, e.g. ghc-options: -with-rtsopts=-K1K in Cabal.

This technique does not detect when an excessive number of thunks are created but never evaluated, or when a small number of thunks hold on to a large amount of live data.

  
## f.md

      
    Raw
  

              f.md
            
          
    Time and Allocation Profiling

Cost centres and cost-centre stacks

Automatically inserting cost centers:

$ ghc -prof -fprof-auto -rtsopts Main.hs
$ ./Main +RTS -p
Compiler options for profiling

# All modules must be compiled and liked with the -prof option:
# Otherwise SCC are ignored
-prof 

-fprof-auto
-fprof-auto-top

# Less used
-fprof-auto-exported
-fprof-auto-calls
-fprof-cafs
-fno-prof-auto
-fno-prof-cafs
-fno-prof-count-entries¶
+RTS options


The -p option produces a standard time profile report. It is written into the file .prof; the stem is taken to be the program name by default, but can be overridden by the -po ⟨stem⟩ flag.


The -P option produces a more detailed report containing the actual time and allocation data as well. (Not used much.)


The -pa option produces the most detailed report containing all cost centres in addition to the actual time and allocation data.


More at https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#time-and-allocation-profiling
Inserting cost centres by hand

{-# SCC "name" #-} 
f x y = ...
  where
    g z = ...
    {-# SCC g #-}

{-# SCC f #-}
Using a different name:

f x y = ...
{-# SCC f "cost_centre_name" #-}
Example of program:

main :: IO ()
main = do let xs = [1..1000000]
          let ys = [1..2000000]
          print $ {-# SCC last_xs #-} last xs
          print $ {-# SCC last_init_xs #-} last $ init xs
          print $ {-# SCC last_ys #-} last ys
          print $ {-# SCC last_init_ys #-} last $ init ys


## g.md

      
    Raw
  

              g.md
            
          
    Profiling memory usage

Detecting the causes of space leaks.
Space leaks lead to slower execution due to heavy garbage collector activity, and may even cause the program to run out of memory altogether.
$ ghc -O2 -prof -fprof-auto -rtsopts -threaded -eventlog <program>

$ ./<program> +RTS -N -h 

# Use hp2ps to analyze your program.hp
# Use threadscope or eventlog2html to analyze your program.eventlog
hp2ps – Rendering heap profiles to PostScript

$ hp2ps <program-name>.hp
$ gv <program-name>.ps

It is possible to view the heap profile of a running program: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#viewing-the-heap-profile-of-a-running-program
RTS options

Take a look at the manual: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#rts-options-for-heap-profiling
Profiling Parallel and Concurrent Programs

Use: -prof -threaded
Use: +RTS -N
Profiling doesn't work well on a parallel program:

The overhead is like ~30%

Helps a bit: -fno-prof-count-entries


There are race conditions on the cost-centers statistics.

Use ThreadScope

  
## h.md

      
    Raw
  

              h.md
            
          
    Retainer Profiling

Retainer profiling is designed to help answer questions like “why is this data being retained?”
Biographical Profiling

A typical heap object may be in one of the following four states at each point in its lifetime:

The lag stage, which is the time between creation and the first use of the object,
the use stage, which lasts from the first use until the last use of the object, and
The drag stage, which lasts from the final use until the last reference to the object is dropped.
An object which is never used is said to be in the void state for its whole lifetime.

A biographical heap profile displays the portion of the live heap in each of the four states listed above.