Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 49 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save qwe321qwe321qwe321/efae4569576006624c34f23b2dd76a58 to your computer and use it in GitHub Desktop.
Save qwe321qwe321qwe321/efae4569576006624c34f23b2dd76a58 to your computer and use it in GitHub Desktop.
Purging old artifacts with GitHub Actions API

With GitHub Actions, a workflow can publish artifacts, typically logs or binaries. As of early 2020, the life time of an artifact is hard-coded to 90 days (this may change in the future). After 90 days, an artifact is automatically deleted. But, in the meantime, artifacts for a repository may accumulate and generate mega-bytes or even giga-bytes of data files.

It is unclear if there is a size limit for the total accumulated size of artifacts for a public repository. But GitHub cannot reasonably let multi-giga-bytes of artifacts data accumulate without doing anything. So, if your workflows regularly produce large artifacts (such as "nightly build" procedures for instance), it is wise to cleanup and delete older artifacts without waiting for the 90 days limit.

Using the Web page for the "Actions" of a repository, it is possible to browse old workflow runs and manually delete artifacts. But the procedure is slow and tedious. It is fine to delete one selected artifact. It is not for a regular cleanup. We need automation.

The GitHub Actions API gives the possibility to browse through the history of workflows and delete selected artifacts. The shell-script below is an example of artifact cleanup which can be run on a regular basis (from your crontab for instance).

Each artifact is identified by a "name" (the name parameter in the actions/upload-artifact step). The shell-script browses through all runs of all workflows on the selected repository. For each artifact name, the five most recent instances of this artifact are kept. All older ones are deleted.

Implementation notes:

  • You need to customize your repository URL, your GitHub user name and GitHub personal token. This token must have admin rights to be allowed to delete artifacts.
  • Since the list of existing artifacts can be very long, it is "paged", i.e. each API invocation returns only a "page" of artifacts. The script loops on all pages, starting with the main URL for the request. The URL of the next "page" can be found in the response headers.
  • The artifacts are always returned from most recent to oldest. So, we skip the first $KEEP ones for each artifact name and delete all others.
#!/usr/bin/env bash

# Customize those three lines with your repository and credentials:
REPO=https://api.github.com/repos/OWNER/REPO
GITHUB_USER=your-github-user-name
GITHUB_TOKEN=token-with-workflow-rights-on-repo

# Number of most recent versions to keep for each artifact:
KEEP=5

# A shortcut to call GitHub API.
ghapi() { curl --silent --location --user $GITHUB_USER:$GITHUB_TOKEN "$@"; }

# A temporary file which receives HTTP response headers.
TMPFILE=/tmp/tmp.$$

# An associative array, key: artifact name, value: number of artifacts of that name.
declare -A ARTCOUNT

# Process all artifacts on this repository, loop on returned "pages".
URL=$REPO/actions/artifacts
while [[ -n "$URL" ]]; do

    # Get current page, get response headers in a temporary file.
    JSON=$(ghapi --dump-header $TMPFILE "$URL")

    # Get URL of next page. Will be empty if we are at the last page.
    URL=$(grep '^Link:' "$TMPFILE" | tr ',' '\n' | grep 'rel="next"' | head -1 | sed -e 's/.*<//' -e 's/>.*//')
    rm -f $TMPFILE

    # Number of artifacts on this page:
    COUNT=$(( $(jq <<<$JSON -r '.artifacts | length') ))

    # Loop on all artifacts on this page.
    for ((i=0; $i < $COUNT; i++)); do
		
        # Get name of artifact and count instances of this name.
        name=$(jq <<<$JSON -r ".artifacts[$i].name?")
        ARTCOUNT[$name]=$(( $(( ${ARTCOUNT[$name]} )) + 1))
		#printf "#%d %s - %d\n" $i "$name" ${ARTCOUNT[$name]}
        # Check if we must delete this one.
        if [[ ${ARTCOUNT[$name]} -gt $KEEP ]]; then
            id=$(jq <<<$JSON -r ".artifacts[$i].id?")
            size=$(( $(jq <<<$JSON -r ".artifacts[$i].size_in_bytes?") ))
            printf "Deleting %s #%d, %d bytes\n" "$name" ${ARTCOUNT[$name]} $size
            ghapi -X DELETE $REPO/actions/artifacts/$id
        fi
    done
done

This script has been successfully tested on recent Linux distros, on macOS (with latest bash from Homebrew) and Windows (with Cygwin).

Pre-requisites:

  • bash version 4 or higher because the script uses an associative array.
  • curl to perform HTTP request on the GitHub API.
  • jq to parse and query the JSON responses from this API.
@yong2khoo-lm
Copy link

Save my life ❤

@janfabian
Copy link

🎊 ❤️

@astrowq
Copy link

astrowq commented Nov 27, 2022

<3

@eduard-unplugged
Copy link

Dude, you just saved my **s

@kevincox
Copy link

This depends on the case of the Link header which will break pagnation. You need to add a -i to the grep command. In my case link: was lower case.

@kevincox
Copy link

This depends on the case of the Link header which will break pagnation. You need to add a -i to the grep command. In my case link: was lower case.

@DPatrickBoyd
Copy link

I have a more straight forward method using the gh api CLI tool. It deals with pagination, etc as well

https://gist.github.com/DPatrickBoyd/afb54165df0f51903be3f0edea77f9cb

tell me what you think

@tannerhallman
Copy link

tannerhallman commented May 23, 2023

This depends on the case of the Link header which will break pagnation. You need to add a -i to the grep command. In my case link: was lower case.

@kevincox was correct. In my case, this edit allowed pagination to work:

change
URL=$(grep '^Link:' "$TMPFILE" | tr ',' '\n' | grep 'rel="next"' | head -1 | sed -e 's/.*<//' -e 's/>.*//')
to
URL=$(grep -i '^Link:' "$TMPFILE" | tr ',' '\n' | grep 'rel="next"' | head -1 | sed -e 's/.*<//' -e 's/>.*//')

@deliciouslytyped
Copy link

Thanks for this, I ended up using and patching @DPatrickBoyd 's version.

@fesaab
Copy link

fesaab commented Jul 31, 2023

Awesome! Thanks a lot for this!

@eluchsinger
Copy link

I had a problem with this script, because my Packages use a . in between (e.g. My.Package) and that caused an error. But thanks anyways, I got @DPatrickBoyd 's script running! Amazing work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment