Skip to content

Instantly share code, notes, and snippets.

@rodw
Last active March 30, 2024 15:04
Show Gist options
  • Save rodw/3073987 to your computer and use it in GitHub Desktop.
Save rodw/3073987 to your computer and use it in GitHub Desktop.
A simple script to backup an organization's GitHub repositories, wikis and issues.
#!/bin/bash
# A simple script to backup an organization's GitHub repositories.
#-------------------------------------------------------------------------------
# NOTES:
#-------------------------------------------------------------------------------
# * Under the heading "CONFIG" below you'll find a number of configuration
# parameters that must be personalized for your GitHub account and org.
# Replace the `<CHANGE-ME>` strings with the value described in the comments
# (or overwrite those values at run-time by providing environment variables).
#
# * If you have more than 100 repositories, you'll need to step thru the list
# of repos returned by GitHub one page at a time, as described at
# https://gist.github.com/darktim/5582423
#
# * If you want to back up the repos for a USER rather than an ORGANIZATION,
# there's a small change needed. See the comment on the `REPOLIST` definition
# below (i.e search for "REPOLIST" and make the described change).
#
# * Thanks to @Calrion, @vnaum, @BartHaagdorens and other commenters below for
# various fixes and updates.
#
# * Also see those comments (and related revisions and forks) for more
# information and general troubleshooting.
#-------------------------------------------------------------------------------
#-------------------------------------------------------------------------------
# CONFIG:
#-------------------------------------------------------------------------------
GHBU_ORG=${GHBU_ORG-"<CHANGE-ME>"} # the GitHub organization whose repos will be backed up
# # (if you're backing up a USER's repos, this should be your GitHub username; also see the note below about the `REPOLIST` definition)
GHBU_UNAME=${GHBU_UNAME-"<CHANGE-ME>"} # the username of a GitHub account (to use with the GitHub API)
GHBU_PASSWD=${GHBU_PASSWD-"<CHANGE-ME>"} # the password for that account
#-------------------------------------------------------------------------------
GHBU_BACKUP_DIR=${GHBU_BACKUP_DIR-"github-backups"} # where to place the backup files
GHBU_GITHOST=${GHBU_GITHOST-"github.com"} # the GitHub hostname (see comments)
GHBU_PRUNE_OLD=${GHBU_PRUNE_OLD-true} # when `true`, old backups will be deleted
GHBU_PRUNE_AFTER_N_DAYS=${GHBU_PRUNE_AFTER_N_DAYS-3} # the min age (in days) of backup files to delete
GHBU_SILENT=${GHBU_SILENT-false} # when `true`, only show error messages
GHBU_API=${GHBU_API-"https://api.github.com"} # base URI for the GitHub API
GHBU_GIT_CLONE_CMD="git clone --quiet --mirror git@${GHBU_GITHOST}:" # base command to use to clone GitHub repos
TSTAMP=`date "+%Y%m%d-%H%M"` # format of timestamp suffix appended to archived files
#-------------------------------------------------------------------------------
# (end config)
#-------------------------------------------------------------------------------
# The function `check` will exit the script if the given command fails.
function check {
"$@"
status=$?
if [ $status -ne 0 ]; then
echo "ERROR: Encountered error (${status}) while running the following:" >&2
echo " $@" >&2
echo " (at line ${BASH_LINENO[0]} of file $0.)" >&2
echo " Aborting." >&2
exit $status
fi
}
# The function `tgz` will create a gzipped tar archive of the specified file ($1) and then remove the original
function tgz {
check tar zcf $1.tar.gz $1 && check rm -rf $1
}
$GHBU_SILENT || (echo "" && echo "=== INITIALIZING ===" && echo "")
$GHBU_SILENT || echo "Using backup directory $GHBU_BACKUP_DIR"
check mkdir -p $GHBU_BACKUP_DIR
$GHBU_SILENT || echo -n "Fetching list of repositories for ${GHBU_ORG}..."
REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?per_page=100 -q | check grep "^ \"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'` # hat tip to https://gist.github.com/rodw/3073987#gistcomment-3217943 for the license name workaround
# NOTE: if you're backing up a *user's* repos, not an organizations, use this instead:
# REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/user/repos -q | check grep "^ \"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
$GHBU_SILENT || echo "found `echo $REPOLIST | wc -w` repositories."
$GHBU_SILENT || (echo "" && echo "=== BACKING UP ===" && echo "")
for REPO in $REPOLIST; do
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO}"
check ${GHBU_GIT_CLONE_CMD}${GHBU_ORG}/${REPO}.git ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO}.wiki (if any)"
${GHBU_GIT_CLONE_CMD}${GHBU_ORG}/${REPO}.wiki.git ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.wiki-${TSTAMP}.git 2>/dev/null && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.wiki-${TSTAMP}.git
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO} issues"
check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/repos/${GHBU_ORG}/${REPO}/issues -q > ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.issues-${TSTAMP} && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.issues-${TSTAMP}
done
if $GHBU_PRUNE_OLD; then
$GHBU_SILENT || (echo "" && echo "=== PRUNING ===" && echo "")
$GHBU_SILENT || echo "Pruning backup files ${GHBU_PRUNE_AFTER_N_DAYS} days old or older."
$GHBU_SILENT || echo "Found `find $GHBU_BACKUP_DIR -name '*.tar.gz' -mtime +$GHBU_PRUNE_AFTER_N_DAYS | wc -l` files to prune."
find $GHBU_BACKUP_DIR -name '*.tar.gz' -mtime +$GHBU_PRUNE_AFTER_N_DAYS -exec rm -fv {} > /dev/null \;
fi
$GHBU_SILENT || (echo "" && echo "=== DONE ===" && echo "")
$GHBU_SILENT || (echo "GitHub backup completed." && echo "")
@JPC18
Copy link

JPC18 commented Nov 6, 2023

Having a prob that the script only recognize 100 repositories, but my association has 150
Is that a way to take this cap?

@jimklimov
Copy link

jimklimov commented Nov 10, 2023

@JPC18 : you need to parse paginated output of github REST API.

FWIW, my continuation of this gist as posted in https://github.com/jimklimov/github-scripts seems to have pulled all 222 of my repos (346 if adding issues and PR info available for some of those, which are also git repos under the hood), 187 for a colleague... so this part works quite well :)

I've checked that the orgs I back up happen to all have under 100 repos, though... so that aspect is a bit lacking in real-life testing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment