Skip to content

Instantly share code, notes, and snippets.

@wlonkly
wlonkly / debugging.md
Last active July 8, 2021 15:13
Steps I took to troubleshoot a full disk

I wrote this down after I responded to a page today (a holiday) because it would've been a decent pairing opportunity for a couple of new people on my team. Second best is that people can read what I did afterwards and ask me any questions. And then I realized that there's nothing PagerDuty-specific or confidential in here, so I may as well share it wider. It's hardly an epic incident, but it's a good example of "doing the work", I think. I borrowed the "write down what you learned" approach from Julia "b0rk" Evans. It's a fantastic practice.

The PagerDuty incident: "Disk will be full in 12 hours. device:/dev/nvme0n1p1, host:stg-nomadusw2-client-..."

(Note for non-PD readers: We run Nomad where others might run Kubernetes.)

Here's the process I went through.

  • Noticed that the usual docker system prune -a -f didn't resolve it
  • Tried docker system prune -a -f and it cleared up 0B