Skip to content

Instantly share code, notes, and snippets.

@itskingori
Last active July 26, 2019 15:37
Show Gist options
  • Save itskingori/94990a978861b49615ed4d2111e3ccb1 to your computer and use it in GitHub Desktop.
Save itskingori/94990a978861b49615ed4d2111e3ccb1 to your computer and use it in GitHub Desktop.
Scripts to debug and fix weave CNI issues

Scripts

verify-weave.sh - Goes through all weave pods in the cluster, computes a checksum on the 'status ipam' list and tells you if there are any weave pods that disagree on their peer list. It was used this to identify the different groups of pods in the cluster and decide which we wanted to preserve and which ones we wanted to reset/restart. Another quick script bump-weave.sh - Used to remove the db file and restart for those weave pods we wished to reset.

Source

#!/bin/bash
# takes a weave pod name and, after confirmation, removes the
# weave db and deletes the pod
WP=$1
if [[ ! $WP =~ weave-[a-z0-9]+ ]]
then
echo "Usage: $0 weave-podname"
echo "Will show details of the node hosting the weave pod and then"
echo "if confirmed, will remove the weave db and delete the weave pod"
exit 1
fi
NODE=$( kubectl get pod $WP -o wide --no-headers | awk '{ print $7 }' )
echo --------------------------------------------
echo "Weave pod $WP found on $NODE"
echo "Houses the following pods:"
kubectl describe node $NODE | sed -n -e '/^ ---------/,/Allocated/p' | sed -e '1d' -e '$d' | awk '{ print $1 "\t" $2 }'
echo --------------------------------------------
read -p "Enter 'YES' to bump this weave pod: "
if [[ $REPLY != YES ]]
then
echo "No action taken"
exit
else
echo "Deleting db file..."
kubectl exec -ti $WP -c weave -- rm /weavedb/weave-netdata.db
kubectl delete pod $WP
fi
#!/bin/bash
#
# This script will iterate over all weave pods and
# determine if they are all consistent or not
#
if [[ $* ]]
then
echo "usage: $0"
echo "This script will pull the 'ipam status' data from each node in the"
echo "current cluster and verify if they are all consistent. If they are not"
echo "consistent then the different groups of pods will be shown"
exit
fi
# Get a list of all weave pods
echo "Gathering list of weave pods..."
WEAVEPODS=$( kubectl get pod -n kube-system | grep weave-net | awk '{ print $1 }' )
WEAVECOUNT=$( echo "$WEAVEPODS" | wc -l )
echo "Found $WEAVECOUNT weave pods."
# Compute the checksum and unreachables for each weave pod and put the name in the appropriate list
echo "Computing status checksum for weave pods..."
for wp in $WEAVEPODS
do
echo -n "."
# Peerdata contains the (sanitized) output from 'status ipam' such that it should be comparable between nodes
# Unreach contains the list of unreachables from the status output
# Peersum is the resulting checksum of the peerdata from above
PEERDATA=$( kubectl exec $wp -n kube-system -c weave -- ./weave --local status ipam | awk '{ print $1 "," $2 }' | sort )
UNREACH=$( kubectl exec $wp -n kube-system -c weave -- ./weave --local status ipam | grep unreachable | sort )
PEERSUM="sum$( echo "$PEERDATA" | sum | awk '{ print $1 }' )"
if [[ ${CHECKSUMS[$PEERSUM]} ]]
then
CHECKSUMS[$PEERSUM]="${CHECKSUMS[$PEERSUM]} $wp"
else
#echo "Found new sum $PEERSUM..."
CHECKSUMS[$PEERSUM]=$wp
fi
if [[ $UNREACH ]]
then
UNREACHABLES[$PEERSUM]="$UNREACH"
fi
done
echo
# If there's more than one checksum for the cluster, then there is an inconsistency.
if [[ ${#CHECKSUMS[@]} -gt 1 ]]
then
echo "Found ${#CHECKSUMS[@]} different peer lists..."
for ps in ${!CHECKSUMS[@]}
do
echo
echo "Group $ps has $( echo ${CHECKSUMS[$ps]} | wc -w ) nodes:"
echo ${CHECKSUMS[$ps]}
if [[ ${UNREACHABLES[$ps]} ]]
then
echo "The following unreachable peers exist:"
echo ${UNREACHABLES[$ps]} | sed -e 's/\! */\n/g'
fi
done
else
echo "All weave pods are consistent."
if [[ ${UNREACHABLES[$PEERSUM]} ]]
then
echo "The following unreachable peers exist:"
echo ${UNREACHABLES[$PEERSUM]} | sed -e 's/! */\n/g'
fi
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment