Skip to content

Instantly share code, notes, and snippets.

@rfairley
Last active June 22, 2018 02:59
Show Gist options
  • Save rfairley/fe4a25c6abc4df61acc223e3e574a50c to your computer and use it in GitHub Desktop.
Save rfairley/fe4a25c6abc4df61acc223e3e574a50c to your computer and use it in GitHub Desktop.
Bash scripts to find bad output from multiple kola runs

kola-find-bad-output

These are scripts that were handy in getting the panic to appear as a result of the race condition in coreos/bugs#1987. Running run-tests-parallel causes the panic to show in journal.txt around 2/5 times, and run-tests-sequentialshows this around 3/10 times (number occurences of panic / number of tests run in the script).

Prerequisites

What it does

The platform used is QEMU as this is offline, and the test is coreos.ignition.v2.once as this is a test that runs quickly. Effectively, kola run starts up a QEMU machine, runs the test, and shuts down the machine. This allows shutdown logs to be available in journal.txt. The script find-bad-output uses find and kola check-console to check all of the journal.txt files at once, revealing the location of the "bad output" journal files in the _kola_temp folder.

Typical use case

  1. Copy the scripts below into separate files in a new directory.
  2. Run the following:
./run-tests-parallel
./find-bad-output

or

./run-tests-sequential
./find-bad-output
  1. If find-bad-output shows a message similar to the following:
./_kola_temp/qemu-2018-06-19-1604-13993/coreos.ignition.v2.once/598aa40f-1
4ba-4b65-9b23-470e6273b2f2/journal.txt: Go panic (runtime error: invalid m
emory address or nil pointer dereference)
./_kola_temp/qemu-2018-06-19-1604-13993/coreos.ignition.v2.once/598aa40f-1
4ba-4b65-9b23-470e6273b2f2/journal.txt: segfault

Then a failure happened in one of the tests (in this case the test is coreos.ignition.v2.once).

The scripts are as follows:

The scripts are also found here https://github.com/rfairley/kola-find-bad-output.

run-tests-parallel

#!/bin/bash

# Run several tests as child processes. Would not recommend specifying more than
# 10 tests to run locally in parallel (due to memory, and storage constraints
# in /tmp).
# KFBO <=> KOLA_FIND_BAD_OUTPUT

# TODO: enter using command line.
KFBO_OUTPUT_NUM_TESTS=5
KFBO_TEST_TO_RUN="coreos.ignition.v2.once"

KFBO_KOLA_PATH="/home/$USER/go/bin/kola"
KFBO_QEMU_IMAGE="./coreos_production_image.bin"

echo "kfbo: kola running $KFBO_OUTPUT_NUM_TESTS processes of test $KFBO_TEST_TO_RUN..."

for i in `seq 1 $KFBO_OUTPUT_NUM_TESTS`;
do
	$KFBO_KOLA_PATH run -p qemu $KFBO_TEST_TO_RUN --qemu-image $KFBO_QEMU_IMAGE &
	pids[${i}]=$!
done

for pid in ${pids[*]}; do
	echo "kfbo: waiting for process $pid to exit..."
	wait $pid
done

echo "kfbo: done running parallel tests."

run-tests-sequential

#!/bin/bash

# TODO: enter using command line.
KFBO_NUM_TESTS=10
KFBO_TEST_TO_RUN="coreos.ignition.v2.once"

KFBO_KOLA_PATH="/home/$USER/go/bin/kola"
KFBO_QEMU_IMAGE="./coreos_production_image.bin"

echo "kfbo: kola running $KFBO_NUM_TESTS iterations of test $KFBO_TEST_TO_RUN..."

for i in `seq 1 $KFBO_NUM_TESTS`;
do
	$KFBO_KOLA_PATH run -p qemu $KFBO_TEST_TO_RUN --qemu-image $KFBO_QEMU_IMAGE
done

echo "kfbo: done running sequential tests."

find-bad-output

#!/bin/bash

while read f; do
	kola check-console $f
done < <(find -name "journal.txt")

cleanup-mess

The following script cleanup-mess was used to kill processes and remove directories in /tmp left after SIGTERMing the run-test-parallel script after it had begun running kola run in the background. Please read through the script closely before using - in case something is deleted unintentionally.

#!/bin/bash

# Clean up processes if ran out of disk space or memory midway through kola tests.
# - There may be more items needing cleaning. This is just enough to get running
#   again if you ran out of space in /tmp or memory.
# - Be careful before running that there are no conflicts with other processes
#   or files that might be destroyed by running this script.

# Run 'df' and 'pa -a' before and after - should clear up kola-related processes
# and the /tmp folder.

KFBO_CLEANUP_MESS_PROCESSES_NAMES=("kola" "dnsmasq" "qemu-system-x86" \
  "dnsmasq <defunct>" "qemu-system-x86 <defunct>")
KFBO_CLEANUP_MESS_DIR_NAMES=("simple-etcd-.*" "mantle-ssh-.*")

for process_name in "${KFBO_CLEANUP_MESS_PROCESSES_NAMES[@]}"
do
  pkill --full --echo "$process_name"
done

for dir_name_regex in "${KFBO_CLEANUP_MESS_DIR_NAMES[@]}"
do
  while read dir
    do
      echo "kfbo: removing /tmp/$dir"
      sudo rm -r /tmp/"$dir"
    done < <(ls /tmp | grep "$dir_name_regex")
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment