Skip to content

Instantly share code, notes, and snippets.

@terminalmage
Last active February 13, 2024 14:44
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save terminalmage/7a655ae441efa888050a22afbac7ac5a to your computer and use it in GitHub Desktop.
Save terminalmage/7a655ae441efa888050a22afbac7ac5a to your computer and use it in GitHub Desktop.
Salt Debugging Best Practices (SaltConf 18)

Salt Debugging Best Practices

These notes should prove useful for those looking for tips on how to find and fix bugs, as well as those who are developing Salt and would like to improve/streamline the process.

NOTE: These notes are written from the perspective of a developer working in a Linux environment. Those on MacOS may need to make some adjustments. Those on Windows may need to make several adjustments.

Additionally, you will see several uses of $PWD in the CLI examples below. It is expected that when you run these commands, you are doing so from the root of a git checkout of Salt. This will mount the repo into the container so that it can be used to run Salt.

What you will need

  • Docker
  • Git
  • Your text editor of choice

salt-docker is a tool which builds docker images that have all the prerequisites to run Salt and its tests, and launches containers with a clone of Salt mounted into the container. Within the container, PATH and PYTHONPATH are set such that when you run Salt commands, you are running them against the mounted-in copy of the Salt codebase.

Review the README for salt-docker for help getting set up.

The images built by salt-docker can be used to set up reproducible test cases to share with others (or include in a bug report). After launching into a container and setting up some files under /srv/salt and/or /srv/pillar, install needed packages, etc., you can then (from outside the container) run docker commit container_id user/image:tag to save the contents of the running container with a new image name. If you are unsure of the container_id, it is the hexidecimal string you see when you are in a salt-docker container:

(saltdev) root@45515ec019d1:/#

For this container, you can use 45515ec019d1 as the container_id when committing a new image.

I like to use an image name of issues and then a tag number which references an associated Github issue number, where applicable. An example of this would be terminalmage/issues:12345. By naming it that way (with my Docker Hub username included), I can do a docker push and share the image with others. With this container pushed up to the Docker Hub, you can instruct someone to run the container like so:

docker run --rm -it -v $PWD:/testing user/image:tag salt-call state.apply test

This would download and launch the container, and run the states in /srv/salt/test.sls that were saved in the container when you ran docker commit.

If you have git 2.5 or newer, you should be using git worktrees. Normally, if you're working on code in one branch, and you need to stop and work on something else, you would need to stash your changes and then create a new branch, then come back to the branch you were working on before and apply the stash to continue working. Using worktrees, you have separate entire copies of the repo in their own directories, but they all use the "main" checkout's .git directory to store their metadata.

In my workflow, I have the Salt repo cloned to ~/git/salt/main. I never write code in this directory. Whenever I have something to work on, I navigate to ~/git/salt/main, switch to the branch from which I wish to make changes, and create a worktree:

% git worktree add ../issue12345

This command does two things:

  1. Creates a new worktree at the specified path
  2. Creates a new branch issue12345 and checks it out in that directory

You can use -b branchname to specify the branch, and you can also specify a revision to use when checking out the branch (the default is HEAD). If you do not specify a branch, git will create one matching the basename of the path you specify for the worktree.

To remove the worktree, just delete the directory and remove the branch. You can also then run git worktree prune -v to clean up the worktree metadata.

rm -rf /path/to/worktrees/issue12345
git worktree prune -v
git branch -d issue12345

If you have pushed this branch to GitHub (for example, to open a PR), and no longer need it, don't forget to clean up the old branch!

git push origin :issue12345

Using a git bisect is a helpful way of discovering which commit caused a given bug. Given two commits (one from before the bug appeared and one from after), a binary search algorithm is performed (based on feedback from the user) to find the commit where the bug first occurred.

Before you start a bisect, you need to find a commit where the bug does not exist (i.e. a "good" commit), as well as one where the bug does exist (i.e. a "bad" commit). It is important that the "good" commit is a direct ancestor of the "bad" commit.

The easiest case for finding a good commit is when you know the bug does not exist in one Salt release, but it does in another. In those cases you can simply use the tags for those release as your good and bad commits. However, when you don't have specific information about when the bug occurred, sometimes the easiest way to find this "good" commit is to just do a hard-reset to several commits before HEAD (e.g. git reset --hard HEAD~20) and keep trying until the bug is no longer present.

Once you have the good and bad commits, it's time to start the bisect. To do so, run git bisect start. You can then specify the good and bad commits:

git bisect good abcdef1
git bisect bad 012345a

Tags and other refs can also be used:

git bisect good v2018.3.1
git bisect bad HEAD

Once both a good and bad commit has been specified, git will point the repo at the commit which is at the midpoint of the good and bad commits. From here, you can run the code to see if the bug exists. If it does, run git bisect bad, and if it does not, run git bisect good. Either way, this will point the repo at another commit, and you can repeat running the code and run either git bisect bad or git bisect good. After at most about a dozen steps, the bisect will be complete and git will tell you which commit was the first to contain the bug.

Once you are done, or at any point during the bisect, you can run git bisect reset and git will point HEAD at the location it was at before you ran git bisect start.

Using salt-docker is great for git bisects, as you can test the code from a fresh copy of the image for each step of the bisect. As described above, you can set up a container with everything in place to reproduce a bug, and then use docker commit to save that setup to a new image. You can then use that image to run the code for each step of the bisect:

docker run --rm -it -v $PWD:/testing user/image:tag salt-call state.apply foo

You could also just stay launched into a salt-docker container and run salt-call state.apply foo for each step of the bisect.

Automated Git Bisects using salt-docker Docker Images

While git bisects can be run manually, they can also be automated using git bisect run <command>. The command will be repeated for each step of the bisect, and the exit status of the command will be used to mark the commit being tested as good/bad.

This requires a little extra setup at the beginning, but it it allows for the entire bisect to run without any interaction.

You can write a shell script which runs salt, then does some sort of check to see if the bug is present. For example, in the below script, imagine a bug where the sl package fails to install, and the state fails. The below script will attempt run a single state, and then check the output for a True result:

#!/bin/bash

# Ensure that the state output goes to the CLI so we can see the results of
# each step as it runs.
salt-call state.single pkg.installed name=sl | tee /tmp/out

# Look for a True result in the state's output
fgrep -q "Result: True" /tmp/out && exit 0 || exit 1

It's important here that your script returns 0 when the bug is not present, and nonzero when it is. This is because an automated git bisect will use the return code of the command you give it to determine whether the commit is "good" or "bad".

Save your script as /test.sh, and don't forget to give it execute privileges (chmod +x /test.sh), then docker commit your container to save it as an image. You can then use this image to run an automated git bisect:

git bisect run docker run --rm -it -v $PWD:/testing user/image:tag /test.sh

For each step of the bisect, git will check out a commit, then run the docker run command it was given. If the shell script you wrote returns 0, it marks the commit as "good", otherwise it marks it as "bad".

Don't forget, you still need to start the bisect and tell git your known "good" and "bad" commits, before you use git bisect run to start automatically bisecting. Otherwise, git won't know the correct range of commits to search.

git bisect start
git bisect good abcdef1
git bisect bad 012345a
git bisect run docker run --rm -it -v $PWD:/testing user/image:tag /test.sh

Consider the case where what you're testing takes a minute or two to run. Waiting for each step to complete, and then manually marking the step as good or bad, could take a while and keep you from getting other things done. But with a little bit of extra setup, you can let git do the rest of the work for you.

Troubleshooting States / Execution Modules (i.e. Stuff That Runs on a Minion)

Use Masterless

When testing something that runs on the minion, testing in masterless mode offers a couple benefits:

  • No need to run a master or exchange keys, so it's much easier to set up your test case

  • Runs in the foreground, making debuggers like pdb/pudb easy to use

To run in masterless mode, you would use salt-call instead of salt. In addition, you must do one of two things:

  • Add --local to the salt-call command
  • Add file_client: local to /etc/salt/minion

Any additional configuration (pillar, fileserver, etc.) must also be done in /etc/salt/minion (or within /etc/salt/minion.d/somefile.conf) when running masterless.

The salt-docker project pre-configures images using file_client: local, so salt-call commands are masterless by default in those images.

$ salt-docker centos7 salt-call pkg.version bash
local:
    4.2.46-29.el7_4

However, often it can be better to first launch into a shell in the container, so that you can run multiple commands before the container exits.

$ salt-docker centos7
[root@60948f923223 /]# salt-call pkg.version zsh
local:
[root@60948f923223 /]# salt-call pkg.install zsh
local:
    ----------
    zsh:
        ----------
        new:
            5.0.2-28.el7
        old:
[root@60948f923223 /]# salt-call pkg.version zsh
local:
    5.0.2-28.el7

pudb is a console-based debugger that is a user-friendly alternative to the pdb debugger in the Python stdlib.

To launch it, simply add the following line where you want to launch the debugger:

import pudb; pu.db

When you run the function being tested, the debugger will start once execution reaches that line of code, and you can use it to step through line-by-line.

pudb is easiest to use when you are running salt-call, but it has a remote debugging component which can be used to test the master and other processes which do not run in the foreground. More on this later.

pdb

Personally, I am a much bigger fan of pudb, but pdb has the benefit of being part of the Python standard library. Launching it is similar to pudb:

import pdb; pdb.set_trace()

From here, you can do pretty much all of what pudb can do, the difference being that you don't get a persistent view of the code as you step through. The last command that you enter at the (pdb) prompt will be repeated if you hit Enter without typing another command, so this can be used to repeat stepping forward. If you use the l or list command it will show you the last few lines before and after your current position, and subsequent repeats of this command (if the current position hasn't been advanced by stepping forward) will show the next several lines. This allows you to run l and then hit Enter a few times and get a picture of the next 20-30 lines of code.

Troubleshooting the Master Using Remote PUDB

Launching a remote pudb session is slightly different than opening pudb in the foreground. Since you will be using telnet to connect to the session, you must tell it what the screen dimensions are so that pudb knows how large of a window to draw:

from pudb.remote import set_trace
set_trace(term_size=(80, 24))

For best results, you should use a fullscreen terminal, and get the number of columns and lines to pass to set_trace():

% tput cols; tput lines
174
40

By default, remote pudb will listen only on localhost:6899. To connect to remote pudb on a Docker container, you should also pass the host parameter to set_trace(). The port can also be specified using the port parameter. For example:

from pudb.remote import set_trace
set_trace(term_size=(174, 40), host='0.0.0.0', port=9999)

When execution reaches the call to set_trace(), if possible pudb will write to the console and tell you the port on which to connect. You can then telnet to the container's IP on that port, and you will connect to the pudb session.

Some caveats to keep in mind when using remote pudb:

  • If multiple processes/threads hit the code path which launches pudb, then pudb will start up separate debuggers for each, and will count up from the initial port to select a listening port

  • The telnet interface is a little finicky. If your goal is to write a script or something to check for an open port and then connect to it, the act of probing for the open port (using nmap, netcat, etc.) will start the pudb session and immediately end it, and by the time you try to connect the port will already be closed and the session over. Best to just loop trying to connect to telnet every N seconds and break from the loop if successful. I wrote a few shell functions to work with debugging using Docker containers, which I've shared alongside this document.

Using salt-docker to Assist in Developing Modules

If you're doing development on existing Salt code, or code that you plan to submit upstream, then you can just edit files inside the git checkout you've mounted into the salt-docker container (i.e. within salt/modules/, salt/states/, etc.).

However, if you want to develop custom modules that you only plan to use internally, you can separately mount the directory where these custom modules reside as another volume. For example:

$ salt-docker --mount /path/to/custom/mods /var/cache/salt/minion/extmods centos7

This would mount /path/to/custom/mods into the location where custom modules would normally be synced to (using one of the saltutil.sync_* functions). Note however that in this case, the module would need to be in a subdirectory of /path/to/custom/mods (i.e. /path/to/custom/mods/states for states, /path/to/custom/mods/modules for execution modules, etc.). If you know that you are only developing an execution module, you could instead mount /path/to/custom/mods to /var/cache/salt/minion/extmods/modules.

If developing custom types that run on the master (e.g. runners), then you would of course be mounting /path/to/custom/mods to the extmods dir in the master cachedir (i.e. /var/cache/salt/master/extmods).

Running Tests

Whether troubleshooting a failing test, or attempting to run a test you are writing, these images are good ways of easily running the test suite against the code in the repository you've mounted into the container.

Note that the upstream documentation recommends running tests using nox. However, nox attempts to set up a virtualenv and installs the test deps into it, i.e. things that salt-docker already does. For that reason, you should simply be able to run pytest directly.

First, launch into a container:

salt-docker centos7

This will get you a shell in that image. From here you can run pytest on a test file directly. Note that salt-docker mounts the salt codebase at /testing, so the path to the test file will be /testing/ followed by the path to the test file, relative to the root of the git repo:

py.test -vvv /testing/tests/pytests/unit/test_fileclient.py

You can also run on entire directories full of test modules.

To run a smaller subset of tests, you can identify the tests you wish to run using filename::funcname or filename::classname::funcname, for example:

py.test -vvv /testing/tests/pytests/unit/test_fileclient.py::test_fsclient_master_no_fs_update

Running a Debugger Within the Test Suite

Debuggers can be used in the test suite. salt-docker has pudb pre-installed, making it a great option.

For unit tests, just add import pudb; pu.db wherever you want to launch the debugger, and make sure that you add --capture=no to your command when running pytest (otherwise pudb won't work).

For integration tests, you will need to use remote pudb procedure launch the debugger. However, when running integration tests, the helper functions to run states/functions often invoke salt itself, so your set_trace may need to be placed in the code being tested rather than the test module in order to get the debugger to step through the code being tested.

There are some issues with running pudb (and to a smaller extent, pdb) within unit tests where functions such as os.path.exists(), os.path.islink(), os.path.isfile(), or os.path.isdir() are mocked. This is because the mocking affects the debuggers as well, so any references to these functions within pudb or pdb's source code could result in an error due to the outcome of those functions being mocked. Thus, when writing tests which mock these functions, the best approach is to use a MagicMock with a side_effect rather than a return_value. For example:

import os
from tests.support.mock import MagicMock, patch, DEFAULT

isfile_mock = MagicMock(side_effect=lambda x: False if x == name else DEFAULT)
with patch(os.path, 'isfile', isfile_mock):
    assert somemod.somefunc(name)

The mock defined above will cause os.path.isfile() to return False if the path matches whatever path is defined by the name variable, and will return the actual result of running os.path.isfile() if the path is anything but that. How you define your mocks will depend on the code being tested, and it may not always be possible to know precisely which path(s) will need to have their results mocked. But taking care when crafting mocks involving the functions described above from os.path will make pudb/pdb run smoother in the event that it becomes necessary to use a debugger to step through the code being tested.

Miscellaneous Tips

  • When using salt-docker, most of the time I find myself just working in a bash shell. In these cases, to start the master/minion daemons you can use -d (e.g. salt-master -d or salt-minion -d). If you want to stop the daemons, use pkill -f salt-master (or pkill -f salt-minion, or just pkill -f salt). The -f flag to pkill will tell it to kill any process which has the associated string in the process title.

  • It can also be helpful to run the master in the foreground with debug output (e.g. salt-master -l debug. But this means that you lose your shell because it will be taken up by salt running in the foreground. However, this is easily worked around. Simply get the container_id before you start the daemon (remember, it's in the prompt):

    (saltdev) root@45515ec019d1:/#
    

    You can then run docker exec -it 45515ec019d1 bash, and you will have a new shell in that same container.

# Feel free to add these to your shell RC file
function drun {
docker run --rm -it -v "$PWD":/testing "$@"
}
function drun-systemd {
local image=$1
test -n "$2" && local container_name=$2 || local container_name="$image-systemd"
if test -z "$image"; then
echo "Missing image name!" 1>&2
return 1
fi
docker run --detach --rm --name $container_name --hostname $container_name --cap-add SYS_ADMIN -v $PWD:/testing -v /sys/fs/cgroup:/sys/fs/cgroup:ro $image /usr/lib/systemd/systemd
}
function dgetip {
local container=$1
local network=$2
if test -z "$container"; then
echo "Missing container name!" 1>&2
return 1
fi
local cfgpath
test -n "$network" && cfgpath=".NetworkSettings.Networks.${network}.IPAddress" || cfgpath=".NetworkSettings.IPAddress"
echo $(docker inspect --format "{{ $cfgpath }}" $container 2>/dev/null)
}
function dssh {
local container=$1
if test -z "$container"; then
echo "Missing container name!" 1>&2
return 1
fi
local network=$2
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null "root@$(dgetip $container $network)"
test -n "$TMUX" && tmux set-window-option automatic-rename
}
function dport_open {
local host=$1
local port=$2
test -z "$port" && return 1
nmap -p "$port" "$host" 2>/dev/null | egrep -q "$port/tcp +open"
return $?
}
function dtelnet {
local container=$1
local port=$2
if test -z "$container"; then
echo "Missing container name!" 1>&2
return 1
fi
test -z "$port" && port=9999
local ip=$(dgetip $container)
if test -z "$ip"; then
echo "Failed to get IP for container '$container'" 1>&2
return 1
fi
while [ 1 ]; do
telnet $ip $port 2>/dev/null && break
echo "Waiting for port $port to open up on $container ($ip)..."
sleep 1
done
}
function dchildren () {
local image_id=$1
if test -z "$image_id"; then
echo "Missing image ID!" 1>&2
return 1
fi
local image
local ret
for image in $(docker images -q); do
docker history -q $image | fgrep -q $image_id || continue
for tag in $(docker inspect --format="{{.RepoTags}}" $image | cut -f2 -d'[' | cut -f1 -d']'); do
ret="$ret\n$tag"
done
done
echo "$ret" | egrep -v '^$' | sort -u
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment