Skip to content

Instantly share code, notes, and snippets.

@hermanbanken
Created September 6, 2021 15:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hermanbanken/4ab6b9ab14b594139c4f3e8ac983b97e to your computer and use it in GitHub Desktop.
Save hermanbanken/4ab6b9ab14b594139c4f3e8ac983b97e to your computer and use it in GitHub Desktop.
Raft cluster completely blocked

Raft issue

When developing my first hashicorp/raft implementation I was stuck at a stone wall it seemed. Somehow I could bootstrap one of the nodes and this would win the election and the raft observation for it becoming Leader also made it in, but then nothing. I couldn't figure out why all RPCs timed out after that.

I started to run locally and debugged using VS Code Launch configuration (which uses Delve) and found that before bootstrapping there was a "runCandidate" goroutine, but after bootstrapping there was no "runLeader" goroutine. Turns out the leader loop never started! How could this be? Then my mind wandered to the observer that I added:

	ch := make(chan raft.Observation, 1)
	r.RegisterObserver(raft.NewObserver(ch, true, func(o *raft.Observation) bool {
		// *RequestVoteRequest
		// RaftState
		// PeerObservation
		// LeaderObservation
		data, _ := json.Marshal(o.Data)
		switch v := o.Data.(type) {
		case raft.RaftState:
			zap.L().Info("raft observation", zap.String("state", v.String()))
		default:
			zap.L().Info("raft observation", zap.ByteString("json", data))
		}
		setHealth()
		return true
	}))

This jewel was added to diagnose the raft process, and to update the server healthcheck status: if the Raft state becomes Shutdown the healthcheck fails to ensure that pods are terminated. However, the argument true means the Observer is blocking, and it will simply halt the process if the channel is not read!

So the fix was to either:

  1. read the channel,
  2. make the observer non-blocking,
  3. return false from the filter function
	go func() {
		for {
			<-ch
		}
	}()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment