timblair/20150820-go-workshop-notes.md

## 20150820-go-workshop-notes.md

      
    Raw
  

              20150820-go-workshop-notes.md
            
          
    2015-08-20: Go Workshop Notes


Workshop run by Bill Kennedy
Course materials available at https://github.com/ardanstudios/gotraining
This was a one-day distillation of the full 3-day course

Variables and Types


Type gives the compiler two things: size + representation. The compiler guarantees these types


When initialising a variable using var, it is given its zero value


Strings, slices, maps, interfaces are all reference types


Strings are immutable two-word data structures

First word is a reference (pointer) to an array
Second word is the number of bytes in the underlying array


Use unsized types (e.g. int, float) unless you specifically need the size

It'll use the word size of the underlying architecture


:= is a short variable declaration operator: it initialises and infers the type from the value on the RHS

SVD must declare at least one variable (e.g. var r string; r, err := foo() would compile)


Go doesn't allow casting; it converts instead (e.g. a := int32(10))


struct allows us to create user-defined types


example{ ... } defines a struct literal (assuming the type example struct has been defined)


type example struct { ... }
e1 := example{} // <-- don't do this
var e1 example  // do this: use `var` when you can (zero-allocation)
                // (exceptions: making code easier to read / reason)

With named types, compiler will enforce type compatibility
With anonymous (unnamed) types, compiler treats as compatible if the type definition matches

Pointers


Values can exist on the stack or the heap
A value only counts as being allocated when it's on the heap
Addresses in stack space go down

i.e. a value added to the stack will have a lower address than once added before it


Anything on the stack does not have to be garbage collected

Stack frames remain allocated on the way back up, and get written / initialised on the way down


Don't think about this when writing code: write for ease of maintenance, then benchmark and tweak
4k stack space in 1.4, 2k in 1.5.
& operator retrieves the address of a value
"Sharing" == pointers
With pointers, we're still passing by value, but the value is an address
The value for a pointer variable (e.g. *int) is always an address

The address must always point to a value of the correct type (int, in this case)


Confusion: the * operator is used in both type definition and pointer dereferencing

E.g. func increment(inc *int) { ... } vs. println("Inc:", *inc)


Example of dangers of using addresses / relying on implementation which might not work in the next version:

Values "escape" from the stack to the heap (escape analysis)
Use go build -gcflags -m to see escape analysis and heap allocation
(-gcflags -S shows all the Plan 9 machine code for your code)


Constants


Constants are a compile-time construct (only exist at compile type)
They also have a parallel type system
Lowest level precision for a numeric constant are 256 bits, and are considered mathematically exact
Untyped constants with a given kind (e.g. const ui = 12345) can be implicitly converted by the compiler
Typed constants (e.g. const ti int = 12345) can't be implicitly converted
You get additional flexibility by using untyped constants
If you hear the phrase "ideal type," whoever said it is talking about constants of a kind
See package time for a really good use of constants + implicit type conversion

(it's also a good example of where constants can be a helpful part of a package's API)


Scoping


Three different scopes: package, function local, and block
"Block" scopes include anything delimited by {}, which includes if and for blocks

if _, err := foo(); err != nil { ... }


Be mindful of variable shadowing: easy to do without realising

Functions


Every package can have an init() function, which will execute before main()

(you can technically have multiple init() methods in the same package...)
Importing a package using a blank identifier means the imported package's init() functions will be called

E.g. import ( _ foo), often used with database packages


Panicking


Programs should not panic, and if they do you probably want the stack trace (so don't handle panics)
Three ways to terminate your program: panic() (to shut down and get the stracktrace), log.Fatal or os.Exit
If you need to handle a panic, use defer + recover() (it's the only way to capture a panic)

defer isn't free: setup + a heap allocation (which gets cleaned up, so doesn't require GC)
See http://play.golang.org/p/eg14ClW4_y for an example of capturing a strack trace


Arrays


The array is the core data structure behind slices, strings, etc
Pointer arithmetic is disallowed, e.g. trying to add to the location of an array
Iteration: use for ... range because it's safe (you can't go outside the range of the array)

for i, fruit := range strings {  // fruit is a copy of each element
	fmt.Println(i, fruit)
}

The size of an array is part of its type: [4]int is of a different type to [5]int
"An array is just a slice waiting to happen"

Slices


Slices are backed by arrays
It is the core data structure, you're unlikely to write any program without them
A slice is a reference type, like a string, but with 3 elements:

pointer: pointer to the memory location of the backing array
length: the number of elements in the slice
capacity: the total capacity of the underlying array


Use make() to create an empty slice, e.g. slice := make([]string, 5) for a 5-element slice

Binary arity version of make() will set both length and capacity to the same thing


Use slices of values, not pointers, because the data will be contiguous in memory
A nil slice ([]T) has both a length and capacity of 0
Use append() to add elements to a slice

If the length and capacity are equal, a new underlying array is created, values are copied across, new element is appended, and the new slice header (with correct pointer, length and capacity) is returned
When capacity is <1000, the capacity is doubled; after that, it ranges between ~20-40%


You can create slices of slices, which is a new view of the underlying array

s2 := s1[2:4]: starting index is 2, 2nd element is an exclusive end index
Think of the 2nd element as start_index + required_length
The length of the new slice will be the requested range
The capacity of the new slice will be cap(s1) - starting_index


Writing to multiple slices off the same backing array is unsafe (values can be overridden)
For safe appends, use a three-index slice to set the capacity to the same as the length: s2 := s1[2:4:4]

Appending to s2 will require a new backing array to be created, making the write safe


You can create a slice over all of an array via s := a[:]
If you can exclude the start_index if want a slice starting from the beginning: s2 := s1[:4]
Nil slice: var data []string; Empty slice: data := []string{}

Use an empty slice only if required, e.g. for JSON serialisation (empty slice will return {}, not null)


Parlour trick: a := []string{100:""} <-- create an array of length 101 with a[100] == ""
If you know the exact size of the slice, It's more efficient to declare that capacity up front

But that will involve an apparently magic number, so don't do it unless you really need to


Methods


A function becomes a method when a receiver is attached to it
Receivers can be a value or pointer receiver:

value: operates on its own copy of the receiver
pointer: operates on a shared value


Value type will automatically be adjusted if necessary:

If you call a method with a value receiver with a pointer, Go will automatically dereference
If you call a method with a pointer receiver with a value, Go will automatically create a reference


All this is really just syntactic sugar; the receiver is effectively the first parameter to the function

Interfaces


Interfaces provide polymorphism, just like any other OO language
Interface type values are reference values: 1st word is the referred-to type; 2nd word is pointer
The nil value for an interface has a nil type and a nil pointer
There is no keyword implements or similar; it just need to be declared with the correct signature
Receiver type (value or pointer) is very important
Values of the incorrect type will not be adjusted automatically
Only values that implement the interface can be used
Think of it from the receiver point of view

If you implement an interface using a pointer receiver, only pointers satisfy the interface


We can't always take the address of T, so we can't include the pointer receiver in the method set

E.g. a plain numeric value might be stuck straight in a register, so has no memory address


Concurrency: Goroutines


Concurrency is about managing lots of things at once
Parallelism is about doing lots of things at once
Any function or method can be launched as a goroutine
By default, one logical processor (one per core in >=1.5) and each logical processor is given one OS thread

Keep the minimum number of OS threads as busy as possible
Don't launch more logical processors than you have cores


Scheduler uses a number of triggers to decide when to do the next thing (blocking calls, GC, function calls...)
A system call may result in the scheduler kicking up an new OS thread while waiting for the system call to return
Don't code to the above: code assuming every goroutine is currently running

Avoids race conditions (use go build -race or go test -race to detect race conditions)
Calls running / returning in different orders
Reading + writing data at the same time


Any function or method call (including anonymous) can be made in to a goroutine by using the go keyword
More than one goroutine == chaos.
If main() ends, the program ends, so use sync.WaitGroup to ensure all goroutines are finished
runtime.GOMAXPROCS() can be used to set the number of logical processors, to allow goroutines to run in parallel
No concept of affinity between logical processors

Concurrency: Channels


A channel is not a queue (even if it can act like one, that's not its purpose)
It's a way of creating guarantees in your code, and switching responsibility between goroutines
Unbuffered channel: guaranteed that the recipient has received the message

This means that sender and/or recipient may be blocked


Buffered channels have a performance benefit, and may avoid the blocking, but there are risks associated with it

Data may not be processed immediately, and if something goes wrong then the data may be lost
There's no guarantee that data is going to come out the other end
What happens when you reach the limit of the buffer?  Then you do get blocking


Suggested rule: don't used buffered channels, especially when the buffer >1

If you can guarantee something will never block, then fine
Use an unbuffered channel to ensure you understand what happens if the channel does block, then add buffering


Unbuffered channel: receive happens first, then send; vice versa for buffered channel
Being able to measure things like back-pressure is important
Worker pools are a good, measurable pattern
Sending on a nil channel will panic; receiving on a nil channel will block forever (so use make(chan T))
close(c) will set state on the channel

Closing a channel does not mark it for GC, which will happen if there are no references left to it
A channel cannot be closed more than once
A send on a closed channel will panic
Any routines receiving on a channel will be immediately notified (would receive the zero value for the type)
A second param on a channel receive (x, ok := <-ch) is only needed if a channel is being explicitly closed


A channel with a type struct{} can be used just for notifications
select + case allows us to receive on multiple channels at the same time

Type Assertions


Also called boxing + unboxing
A method of pulling a concrete type out of an interface

Notes, Tips and Suggestions


Lots of Go's idioms are based around mechanical sympathy (working with the hardware)
Go's syntax is "done" - going forward it's all about performance
Start by understanding type, then once you've got that, move on to behaviour and API design
Don't use the built-in println; use fmt.Println
Don't pass pointers of reference types

The pass-by-value semantics will copy the slice header, which is effectively a pointer


Empty structs are zero-allocation
Go has closures
Sending a SIGQUIT [Cmd-] to a Go program will instantly quit it and print a stacktrace
Define variables as close to where they're used (locally scoped if possible)
The closer the definition, the shorter the variable name can be
Anonymous structs are useful for avoiding type pollution where a type is only used locally

Example of use: JSON deserialisation