lelegard/purging-old-artifacts-with-github-api.md

## purging-old-artifacts-with-github-api.md

      
    Raw
  

              purging-old-artifacts-with-github-api.md
            
          
    With GitHub Actions, a workflow can publish artifacts, typically logs or binaries. As of early 2020, the life time of an artifact is hard-coded to 90 days (this may change in the future). After 90 days, an artifact is automatically deleted. But, in the meantime, artifacts for a repository may accumulate and generate mega-bytes or even giga-bytes of data files.
It is unclear if there is a size limit for the total accumulated size of artifacts for a public repository. But GitHub cannot reasonably let multi-giga-bytes of artifacts data accumulate without doing anything. So, if your workflows regularly produce large artifacts (such as "nightly build" procedures for instance), it is wise to cleanup and delete older artifacts without waiting for the 90 days limit.
Using the Web page for the "Actions" of a repository, it is possible to browse old workflow runs and manually delete artifacts. But the procedure is slow and tedious. It is fine to delete one selected artifact. It is not for a regular cleanup. We need automation.
The GitHub Actions API gives the possibility to browse through the history of workflows and delete selected artifacts. The shell-script below is an example of artifact cleanup which can be run on a regular basis (from your crontab for instance).
Each artifact is identified by a "name" (the name parameter in the actions/upload-artifact step). The shell-script browses through all runs of all workflows on the selected repository. For each artifact name, the five most recent instances of this artifact are kept. All older ones are deleted.
Implementation notes:

You need to customize your repository URL, your GitHub user name and GitHub personal token. This token must have admin rights to be allowed to delete artifacts.
Since the list of existing artifacts can be very long, it is "paged", i.e. each API invocation returns only a "page" of artifacts. The script loops on all pages, starting with the main URL for the request. The URL of the next "page" can be found in the response headers.
The artifacts are always returned from most recent to oldest. So, we skip the first $KEEP ones for each artifact name and delete all others.

#!/usr/bin/env bash

# Customize those three lines with your repository and credentials:
REPO=https://api.github.com/repos/OWNER/REPO
GITHUB_USER=your-github-user-name
GITHUB_TOKEN=token-with-workflow-rights-on-repo

# Number of most recent versions to keep for each artifact:
KEEP=5

# A shortcut to call GitHub API.
ghapi() { curl --silent --location --user $GITHUB_USER:$GITHUB_TOKEN "$@"; }

# A temporary file which receives HTTP response headers.
TMPFILE=/tmp/tmp.$$

# An associative array, key: artifact name, value: number of artifacts of that name.
declare -A ARTCOUNT

# Process all artifacts on this repository, loop on returned "pages".
URL=$REPO/actions/artifacts
while [[ -n "$URL" ]]; do

    # Get current page, get response headers in a temporary file.
    JSON=$(ghapi --dump-header $TMPFILE "$URL")

    # Get URL of next page. Will be empty if we are at the last page.
    URL=$(grep -i '^link:' "$TMPFILE" | tr ',' '\n' | grep 'rel="next"' | head -1 | sed -e 's/.*<//' -e 's/>.*//')
    rm -f $TMPFILE

    # Number of artifacts on this page:
    COUNT=$(( $(jq <<<$JSON -r '.artifacts | length') ))

    # Loop on all artifacts on this page.
    for ((i=0; $i < $COUNT; i++)); do

        # Get name of artifact and count instances of this name.
        name=$(jq <<<$JSON -r ".artifacts[$i].name?")
        ARTCOUNT[$name]=$(( $(( ${ARTCOUNT[$name]} )) + 1))

        # Check if we must delete this one.
        if [[ ${ARTCOUNT[$name]} -gt $KEEP ]]; then
            id=$(jq <<<$JSON -r ".artifacts[$i].id?")
            size=$(( $(jq <<<$JSON -r ".artifacts[$i].size_in_bytes?") ))
            printf "Deleting '%s' #%d, %'d bytes\n" $name ${ARTCOUNT[$name]} $size
            ghapi -X DELETE $REPO/actions/artifacts/$id
        fi
    done
done

This script has been successfully tested on recent Linux distros, on macOS (with latest bash from Homebrew) and Windows (with Cygwin).
Pre-requisites:

bash version 4 or higher because the script uses an associative array.
curl to perform HTTP request on the GitHub API.
jq to parse and query the JSON responses from this API.