Skip to content

Instantly share code, notes, and snippets.

@lelegard
Last active January 18, 2024 07:19
Show Gist options
  • Star 19 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save lelegard/6a428f67ee08e86d0c2f1af3f4a633d0 to your computer and use it in GitHub Desktop.
Save lelegard/6a428f67ee08e86d0c2f1af3f4a633d0 to your computer and use it in GitHub Desktop.
Purging old artifacts with GitHub Actions API

With GitHub Actions, a workflow can publish artifacts, typically logs or binaries. As of early 2020, the life time of an artifact is hard-coded to 90 days (this may change in the future). After 90 days, an artifact is automatically deleted. But, in the meantime, artifacts for a repository may accumulate and generate mega-bytes or even giga-bytes of data files.

It is unclear if there is a size limit for the total accumulated size of artifacts for a public repository. But GitHub cannot reasonably let multi-giga-bytes of artifacts data accumulate without doing anything. So, if your workflows regularly produce large artifacts (such as "nightly build" procedures for instance), it is wise to cleanup and delete older artifacts without waiting for the 90 days limit.

Using the Web page for the "Actions" of a repository, it is possible to browse old workflow runs and manually delete artifacts. But the procedure is slow and tedious. It is fine to delete one selected artifact. It is not for a regular cleanup. We need automation.

The GitHub Actions API gives the possibility to browse through the history of workflows and delete selected artifacts. The shell-script below is an example of artifact cleanup which can be run on a regular basis (from your crontab for instance).

Each artifact is identified by a "name" (the name parameter in the actions/upload-artifact step). The shell-script browses through all runs of all workflows on the selected repository. For each artifact name, the five most recent instances of this artifact are kept. All older ones are deleted.

Implementation notes:

  • You need to customize your repository URL, your GitHub user name and GitHub personal token. This token must have admin rights to be allowed to delete artifacts.
  • Since the list of existing artifacts can be very long, it is "paged", i.e. each API invocation returns only a "page" of artifacts. The script loops on all pages, starting with the main URL for the request. The URL of the next "page" can be found in the response headers.
  • The artifacts are always returned from most recent to oldest. So, we skip the first $KEEP ones for each artifact name and delete all others.
#!/usr/bin/env bash

# Customize those three lines with your repository and credentials:
REPO=https://api.github.com/repos/OWNER/REPO
GITHUB_USER=your-github-user-name
GITHUB_TOKEN=token-with-workflow-rights-on-repo

# Number of most recent versions to keep for each artifact:
KEEP=5

# A shortcut to call GitHub API.
ghapi() { curl --silent --location --user $GITHUB_USER:$GITHUB_TOKEN "$@"; }

# A temporary file which receives HTTP response headers.
TMPFILE=/tmp/tmp.$$

# An associative array, key: artifact name, value: number of artifacts of that name.
declare -A ARTCOUNT

# Process all artifacts on this repository, loop on returned "pages".
URL=$REPO/actions/artifacts
while [[ -n "$URL" ]]; do

    # Get current page, get response headers in a temporary file.
    JSON=$(ghapi --dump-header $TMPFILE "$URL")

    # Get URL of next page. Will be empty if we are at the last page.
    URL=$(grep -i '^link:' "$TMPFILE" | tr ',' '\n' | grep 'rel="next"' | head -1 | sed -e 's/.*<//' -e 's/>.*//')
    rm -f $TMPFILE

    # Number of artifacts on this page:
    COUNT=$(( $(jq <<<$JSON -r '.artifacts | length') ))

    # Loop on all artifacts on this page.
    for ((i=0; $i < $COUNT; i++)); do

        # Get name of artifact and count instances of this name.
        name=$(jq <<<$JSON -r ".artifacts[$i].name?")
        ARTCOUNT[$name]=$(( $(( ${ARTCOUNT[$name]} )) + 1))

        # Check if we must delete this one.
        if [[ ${ARTCOUNT[$name]} -gt $KEEP ]]; then
            id=$(jq <<<$JSON -r ".artifacts[$i].id?")
            size=$(( $(jq <<<$JSON -r ".artifacts[$i].size_in_bytes?") ))
            printf "Deleting '%s' #%d, %'d bytes\n" $name ${ARTCOUNT[$name]} $size
            ghapi -X DELETE $REPO/actions/artifacts/$id
        fi
    done
done

This script has been successfully tested on recent Linux distros, on macOS (with latest bash from Homebrew) and Windows (with Cygwin).

Pre-requisites:

  • bash version 4 or higher because the script uses an associative array.
  • curl to perform HTTP request on the GitHub API.
  • jq to parse and query the JSON responses from this API.
@FeiWangHub
Copy link

FeiWangHub commented Mar 12, 2020

in ubuantu 18.4, I get error that:
parse error: Invalid numeric literal at line 1, column 4

seems error while executing JSON=$(ghapi --dump-header $TMPFILE "$URL"), it returns not found

@FeiWangHub
Copy link

@lelegard
Copy link
Author

Hi @NextDoorWang,
It still works for me. The API hasn't changed. Maybe the "not found" error comes from an error in your repo name.

@FeiWangHub
Copy link

Hi @legeard thanks I will try again, because GitHub just told me they had billing & accounting bug this week, they somehow marked 40G usage while I have 0 artifact left on server. This maybe why "Not Found" happen.

@FDiskas
Copy link

FDiskas commented May 27, 2020

@robin-xyzt-ai
Copy link

The link in the header now uses a lower-case link. Meaning

grep '^Link:' "$TMPFILE"

should become

grep '^link:' "$TMPFILE"

or alternatively, add the --ignore-case flag to grep

@lelegard
Copy link
Author

lelegard commented May 5, 2021

Thanks @robin-xyzt-ai. Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment