Skip to content

Instantly share code, notes, and snippets.

@mccurdyc
Created August 28, 2018 23:51
Show Gist options
  • Save mccurdyc/ec0589d09a2977e9e598c6ee7191c517 to your computer and use it in GitHub Desktop.
Save mccurdyc/ec0589d09a2977e9e598c6ee7191c517 to your computer and use it in GitHub Desktop.
# Gophercon 2018 - Performance Tuning Workshop (Dave Cheney, Fancesc Campoy)

Gophercon 2018 - Performance Tuning Workshop (Dave Cheney, Fancesc Campoy)

motivation

  • a lot of low-level discussion (hardward, etc.)
    • there have been benefits from Moore's law
      • but, we can't just wait for hardware to get faster anymore
    • static power consumption (while machine is idle)
      • small amount of current leaking
    • frequency and heat are correlated in processors
    • Amdahl's law
      • speedup of a program is limited by sequential parts of the program
    • for best performance you need a programming language that:

benchmarking

  • Reference Doc: https://github.com/davecheney/gophercon2018-performance-tuning-workshop/tree/master/2-benchmarking
  • ground rules
    • machine must be idle (don't do anything else while benchmarking)
      • if you can afford it buy isolated hardware and turn off power saving and thermal scaling
        • don't update software version on these machines
    • don't use the benchmark with profiling!
  • golang testing package for benchmarking
    • don't just calculate elapsed time on your own
    • go test -bench=. path/to/package
    • when you just run go test, benchmarks are excluded
    • the function is called b.N times
      • keep increasing b.N until the number of iterations takes about 1 second
        • can request to use different than 1s with -benchtime=10s flag
        • useful to get to the 10,000 iteration minimum below
        • if your benchmark runs millions of iterations in 1s, also unreliable
      • should run for at least 10,000 iterations in order to be reliable Note: GOMAXPROCS default to number of cores
      • can specify number of CPUs to use with -cpu=1,2,4,8 GoDoc: https://golang.org/cmd/go/
      • where these flags are documented
  • comparing benchmarks
  • compiler optimizations screwing with benchmarks
    • the compiler inlines the function
      • bit shifting function
    • store result in exported variable makes it harder for the compiler to prove that something else isnt using that variable
    • generate the assembly code
      • -gcflags "-S"
      • don't need to understand the assembly, but you will see if it makes a function call, replaces function with a constant, etc.

profiling

  • Reference Doc: https://github.com/davecheney/gophercon2018-performance-tuning-workshop/blob/master/4-profiling/1-profiling.md
  • pprof tool
    • use the Google drop-in replacement instead of the go tool
      • go get github.com/google/pprof
      • pprof -http :6060 cpu.pb.gz
        • flamegraph
          • if you want the deep "flames" to be more accurate, you need to run benchmark longer -count flag
          • width is percentage of time executing function
          • colors don't mean anything, just made to look like flames
    • what happens exactly at that instance
    • run pprof more frequently, but ...
  • pprof adds a cost, your program is trying to fulfil pprof request
  • types of supported profiling
    • CPU
    • memory
    • blocking
    • mutex contention
  • why is this benchmark taking so long?
    • cpuprofile flag, -cpuprofile=cpu.pb.gz
  • side notes
    • don't forget to only run benchmarks or your pprof results will look weird
      • -run=XXX
      • if you dont do this it will also benchmark your tests
    • one profile at a time!

memory profiling

  • stack or heap allocations
  • not great for determine how much memory you are using
  • francesc references * https://www.youtube.com/watch?v=N3PWzBeLX2M
  • pprof -http :6060 -alloc_objects mem.pb.gz
  • pprof -http :6060 -inuse_objects mem.pb.gz

blocking profiling

  • channel debugging

http router

  • if you are using net/http, just add _ "net/http/pprof import, that's it!
  • https://github.com/adjust/go-wrk
  • you dont slow down servers doing this until you actually run pprof to profile the endpoint
  • do not make these endpoints public

execution tracer

compiler optimizations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment