Skip to content

Instantly share code, notes, and snippets.

@okor
Created February 18, 2019 15:35
Show Gist options
  • Save okor/afbf0ed338a64e87174596cff192bd88 to your computer and use it in GitHub Desktop.
Save okor/afbf0ed338a64e87174596cff192bd88 to your computer and use it in GitHub Desktop.
Script to clean up kubernetes pods in undesirable states, like after a kops rolling upgrade.
PODINITIALIZING_TIMEOUT_SECONDS = 60*60 # 1 hour
INIT_TIMEOUT_SECONDS = 10*60 # 10 minutes
def normalize_time(time_string)
unit = time_string.scan(/[[:alpha:]]+/).first # s,m,h,d
time = time_string.split(/[[:alpha:]]/).first.to_i # int
case unit
when 's'
time
when 'm'
time*60
when 'h'
time*60*60
when 'd'
time*60*60*24
else
"Error: unknown time unit (#{unit})"
end
end
def delete_pod(pod, reason='')
command="kubectl delete pod #{pod[:name]} -n #{pod[:namespace]}"
puts "Deleting pod because #{reason}"
puts "> " + command
`#{command}`
end
def puts_pod(pod, message='')
pod = pod.clone
pod[:age] = pod[:meta][:original_age]
pod.delete(:meta)
puts "#{message} #{pod}"
end
pods_command = "kubectl get pods -o wide --sort-by={.spec.nodeName} --all-namespaces | grep -v 'Running\\|Completed' | awk '$1 != \"NAMESPACE\" {print $1,$2,$3,$4,$5,$6,$7,$8,$9}'"
pods = `#{pods_command}`.split(/\n/)
pods_data = []
pods.each do |pod|
pod = pod.split(" ")
pod_hash = {
namespace: pod[0],
name: pod[1],
ready: pod[2],
status: pod[3],
restarts: pod[4],
age: normalize_time(pod[5]),
ip: pod[6],
node: pod[7],
nominated_node: pod[8],
meta: { original_age: pod[5] }
}
pods_data.push(pod_hash)
end
# Sort by status
pods_data = pods_data.sort_by { |k| k[:status] }
pods_data.each do |pod|
case pod[:status]
when 'Evicted'
delete_pod(pod, "it was previously Evicted, just cleaning up.")
when 'CrashLoopBackOff', 'RunContainerError', 'Error', 'Init:Error', 'Init:CrashLoopBackOff'
delete_pod(pod, "it's in a #{pod[:status]} state and that's wack.")
when 'PodInitializing'
if pod[:age] > PODINITIALIZING_TIMEOUT_SECONDS
delete_pod(pod, "it's in a #{pod[:status]} state and it's #{pod[:meta][:original_age]} old, so it's probably stuck.")
end
when /^(Init)/
if pod[:age] > INIT_TIMEOUT_SECONDS
delete_pod(pod, "it's in a #{pod[:status]} state and it's #{pod[:meta][:original_age]} old, so it's probably stuck.")
end
when 'Terminating'
# boo, could try --now arg with delete
puts "Doing nothing. Pod is in Terminating state: #{pod}"
else
puts "Doing nothing. Pod is in an unhandled state: #{pod}"
end
end
@okor
Copy link
Author

okor commented Feb 18, 2019

I've found that in my personal experience, when doing rolling upgrades to a kubernetes cluster, it's fairly common to find a bunch of pods in undesirable states. They are often "stuck" in those states.

Running this script will find all pods across all namespaces that are not "Running" or "Complete". Then it will decide whether those pods should be deleted - so that they can be replaced. A fresh pod is often sufficient to get a pod into a healthy state. There is no prompt or safety mechanisms here so run at your own risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment