- Reference to Docs: https://github.com/davecheney/gophercon2018-performance-tuning-workshop
- a lot of low-level discussion (hardward, etc.)
- there have been benefits from Moore's law
- but, we can't just wait for hardware to get faster anymore
- static power consumption (while machine is idle)
- small amount of current leaking
- frequency and heat are correlated in processors
- Amdahl's law
- speedup of a program is limited by sequential parts of the program
- for best performance you need a programming language that:
- is compiled, not interpreted, interpreted languages interact poorly with the CPU
- needs to permit efficient code to be written (bit and bytes, not just assuming all types to be ideal floats) Reference to Doc: https://github.com/davecheney/gophercon2018-performance-tuning-workshop/blob/master/1-welcome/introduction.md
- there have been benefits from Moore's law
- Reference Doc: https://github.com/davecheney/gophercon2018-performance-tuning-workshop/tree/master/2-benchmarking
- ground rules
- machine must be idle (don't do anything else while benchmarking)
- if you can afford it buy isolated hardware and turn off power saving and thermal scaling
- don't update software version on these machines
- if you can afford it buy isolated hardware and turn off power saving and thermal scaling
- don't use the benchmark with profiling!
- machine must be idle (don't do anything else while benchmarking)
- golang
testing
package for benchmarking- don't just calculate elapsed time on your own
go test -bench=. path/to/package
- when you just run
go test
, benchmarks are excluded - the function is called
b.N
times- keep increasing
b.N
until the number of iterations takes about 1 second- can request to use different than 1s with
-benchtime=10s
flag - useful to get to the 10,000 iteration minimum below
- if your benchmark runs millions of iterations in 1s, also unreliable
- can request to use different than 1s with
- should run for at least 10,000 iterations in order to be reliable
Note:
GOMAXPROCS
default to number of cores - can specify number of CPUs to use with
-cpu=1,2,4,8
GoDoc: https://golang.org/cmd/go/ - where these flags are documented
- keep increasing
- comparing benchmarks
benchstat
* https://godoc.org/golang.org/x/perf/cmd/benchstat- keep old code used in previous benchmark around
-c
flag,go test -c
- builds the binary of the old code
- compiler optimizations screwing with benchmarks
- the compiler inlines the function
- bit shifting function
- store result in exported variable makes it harder for the compiler to prove that something else isnt using that variable
- generate the assembly code
-gcflags "-S"
- don't need to understand the assembly, but you will see if it makes a function call, replaces function with a constant, etc.
- the compiler inlines the function
- Reference Doc: https://github.com/davecheney/gophercon2018-performance-tuning-workshop/blob/master/4-profiling/1-profiling.md
pprof
tool- use the Google drop-in replacement instead of the go tool
go get github.com/google/pprof
pprof -http :6060 cpu.pb.gz
- flamegraph
- if you want the deep "flames" to be more accurate, you need to run benchmark longer
-count
flag - width is percentage of time executing function
- colors don't mean anything, just made to look like flames
- if you want the deep "flames" to be more accurate, you need to run benchmark longer
- flamegraph
- what happens exactly at that instance
- run
pprof
more frequently, but ...
- use the Google drop-in replacement instead of the go tool
pprof
adds a cost, your program is trying to fulfilpprof
request- types of supported profiling
- CPU
- memory
- blocking
- mutex contention
- why is this benchmark taking so long?
cpuprofile
flag,-cpuprofile=cpu.pb.gz
- side notes
- don't forget to only run benchmarks or your pprof results will look weird
-run=XXX
- if you dont do this it will also benchmark your tests
- one profile at a time!
- don't forget to only run benchmarks or your pprof results will look weird
- stack or heap allocations
- not great for determine how much memory you are using
- francesc references * https://www.youtube.com/watch?v=N3PWzBeLX2M
pprof -http :6060 -alloc_objects mem.pb.gz
pprof -http :6060 -inuse_objects mem.pb.gz
- channel debugging
- if you are using
net/http
, just add_ "net/http/pprof
import, that's it! - https://github.com/adjust/go-wrk
- you dont slow down servers doing this until you actually run pprof to profile the endpoint
- do not make these endpoints public
- Reference Doc: https://github.com/davecheney/gophercon2018-performance-tuning-workshop/blob/master/5-execution-tracer/1-execution-tracer.md
- tells runtime to log everything that it does
- used when execution is too fast for pprof to catch it
- debugging runtime itself
- do not use in production, huge performance hit (with complex programs)
- good at seeing communication between goroutines
- tracing http endpoint
- https://golang.org/pkg/net/http/pprof/
- send traffic to endpoint (go-wrk, vegeta, apache benchmark, etc.)
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=1
- Reference Doc: https://github.com/davecheney/gophercon2018-performance-tuning-workshop/tree/master/3-compiler-optimisations
- ask compiler for heap allocations
- flag
-gcflags=-m
- flag for more detail
-gcflags=-m -m
- flag
-gcflags=-l -N
disables all compiler optimizations
- flag