Instantly share code, notes, and snippets.

Embed
What would you like to do?
A simple script to backup an organization's GitHub repositories, wikis and issues.
#!/bin/bash
# A simple script to backup an organization's GitHub repositories.
# NOTE: if you have more than 100 repositories, you'll need to step thru the list of repos
# returned by GitHub one page at a time, as described at https://gist.github.com/darktim/5582423
GHBU_BACKUP_DIR=${GHBU_BACKUP_DIR-"github-backups"} # where to place the backup files
GHBU_ORG=${GHBU_ORG-"<CHANGE-ME>"} # the GitHub organization whose repos will be backed up
# (if you're backing up a user's repos instead, this should be your GitHub username)
GHBU_UNAME=${GHBU_UNAME-"<CHANGE-ME>"} # the username of a GitHub account (to use with the GitHub API)
GHBU_PASSWD=${GHBU_PASSWD-"<CHANGE-ME>"} # the password for that account
GHBU_GITHOST=${GHBU_GITHOST-"github.com"} # the GitHub hostname (see comments)
GHBU_PRUNE_OLD=${GHBU_PRUNE_OLD-true} # when `true`, old backups will be deleted
GHBU_PRUNE_AFTER_N_DAYS=${GHBU_PRUNE_AFTER_N_DAYS-3} # the min age (in days) of backup files to delete
GHBU_SILENT=${GHBU_SILENT-false} # when `true`, only show error messages
GHBU_API=${GHBU_API-"https://api.github.com"} # base URI for the GitHub API
GHBU_GIT_CLONE_CMD="git clone --quiet --mirror git@${GHBU_GITHOST}:" # base command to use to clone GitHub repos
TSTAMP=`date "+%Y%m%d-%H%M"`
# The function `check` will exit the script if the given command fails.
function check {
"$@"
status=$?
if [ $status -ne 0 ]; then
echo "ERROR: Encountered error (${status}) while running the following:" >&2
echo " $@" >&2
echo " (at line ${BASH_LINENO[0]} of file $0.)" >&2
echo " Aborting." >&2
exit $status
fi
}
# The function `tgz` will create a gzipped tar archive of the specified file ($1) and then remove the original
function tgz {
check tar zcf $1.tar.gz $1 && check rm -rf $1
}
$GHBU_SILENT || (echo "" && echo "=== INITIALIZING ===" && echo "")
$GHBU_SILENT || echo "Using backup directory $GHBU_BACKUP_DIR"
check mkdir -p $GHBU_BACKUP_DIR
$GHBU_SILENT || echo -n "Fetching list of repositories for ${GHBU_ORG}..."
REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?per_page=100 -q | check grep "\"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
# NOTE: if you're backing up a *user's* repos, not an organizations, use this instead:
# REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/user/repos -q | check grep "\"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
$GHBU_SILENT || echo "found `echo $REPOLIST | wc -w` repositories."
$GHBU_SILENT || (echo "" && echo "=== BACKING UP ===" && echo "")
for REPO in $REPOLIST; do
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO}"
check ${GHBU_GIT_CLONE_CMD}${GHBU_ORG}/${REPO}.git ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO}.wiki (if any)"
${GHBU_GIT_CLONE_CMD}${GHBU_ORG}/${REPO}.wiki.git ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.wiki-${TSTAMP}.git 2>/dev/null && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.wiki-${TSTAMP}.git
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO} issues"
check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/repos/${GHBU_ORG}/${REPO}/issues -q > ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.issues-${TSTAMP} && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.issues-${TSTAMP}
done
if $GHBU_PRUNE_OLD; then
$GHBU_SILENT || (echo "" && echo "=== PRUNING ===" && echo "")
$GHBU_SILENT || echo "Pruning backup files ${GHBU_PRUNE_AFTER_N_DAYS} days old or older."
$GHBU_SILENT || echo "Found `find $GHBU_BACKUP_DIR -name '*.tar.gz' -mtime +$GHBU_PRUNE_AFTER_N_DAYS | wc -l` files to prune."
find $GHBU_BACKUP_DIR -name '*.tar.gz' -mtime +$GHBU_PRUNE_AFTER_N_DAYS -exec rm -fv {} > /dev/null \;
fi
$GHBU_SILENT || (echo "" && echo "=== DONE ===" && echo "")
$GHBU_SILENT || (echo "GitHub backup completed." && echo "")
@marinho

This comment has been minimized.

marinho commented Sep 9, 2013

Hi, well done, this is going to be useful for me :)

Can you just tell me where are the notes about the GitHub's hostname, please?

Thank you!

@charlycoste

This comment has been minimized.

charlycoste commented Sep 12, 2013

a

@rodw

This comment has been minimized.

Owner

rodw commented Oct 10, 2013

Sorry @marinho. That's a little cut-and-paste error that referenced a private wiki.

In the general case you can just use github.com as the host name.

The note that is referenced describes a way to run this backup script under a different set of credentials than one's normal GitHub account. Here's the relevant snippet:

If you want to use more than one GitHub account (e.g., your own account as well as the read-only back-up account), add the following to ~/.ssh/config (creating that file if needed):

Host <BACKUP>.github.com
    HostName github.com
    PreferredAuthentications publickey
    IdentityFile ~/.ssh/<THE-BACKUP-SSH-KEY>

(Where <BACKUP>.github.com is an arbitrary host name, but the same as the value used in the script and ~/.ssh/<THE-BACKUP-SSH-KEY> an ssh key generated with ssh-keygen and uploaded to GitHub.)

You can then login via:

ssh-add ~/.ssh/<THE-BACKUP-SSH-KEY>

and run the backup script as a cron job.

@Calrion

This comment has been minimized.

Calrion commented Oct 14, 2013

Firstly, thanks for this, it's a great help!

For those who, like me, want to backup user repositories rather than organisation repositories, the following small changes are required:

  • Enter your GitHub username as the value of both GHBU_UNAME and GHBU_ORG.

  • Change line 41 to read:

    REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/user/repos -q | check grep "\"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
    

Even though you remove the GHBU_ORG reference from that line, it's used later on to compute the full repository path so it's still needed (and it needs to be your username, as above).

With those changes, this script grabbed all my repositories—public, private, and forked—and made backups of the repository, the wiki, and the issues. Great work! 😄

@bjtitus

This comment has been minimized.

bjtitus commented Feb 6, 2014

I made a few changes to support the paginated repos list since we have more than 100 repositories (the maximum allowed in a single page of the API) https://gist.github.com/bjtitus/8851816#file-backup-github-sh-L42-L49

@reggi

This comment has been minimized.

reggi commented Feb 18, 2014

Took me a long while to realize that I just wanted GHBU_GITHOST=${GHBU_GITHOST-"github.com"} which should be the default!! >.<

@rodw

This comment has been minimized.

Owner

rodw commented Mar 13, 2014

@reggl: Good call. I made that change

@rodw

This comment has been minimized.

Owner

rodw commented Mar 13, 2014

@Calrion Thanks, I added comments describing your changes.

One could probably parameterize the script a bit to support both without "manual" intervention.

@sinsunsan

This comment has been minimized.

sinsunsan commented May 17, 2014

I need to put -i flag to be able my password be accepted, I don't know why.

Secondly, what to do with the files downloaded, is it the content of .git directory ?
How do github deal with repo, issu, wiki. Is it some branches ?
How to restore the files instead of a bare repo ?

@sinsunsan

This comment has been minimized.

sinsunsan commented May 17, 2014

At least I've found why my password wasn't working as it is containing a special char, that needed to be escaped with \

Finally, I've adopted anonother solution, as it was convenient for me to move my issues to bitbucket.
So if you want to do so, instead of just backuping github.
Here is 3 links
https://confluence.atlassian.com/pages/viewpage.action?pageId=330796872
http://stackoverflow.com/questions/11119270/how-to-import-github-issues-and-wikis-to-bitbucket
https://github.com/sorich87/github-to-bitbucket-issues-migration

@railsfactory-suriya

This comment has been minimized.

railsfactory-suriya commented Jul 30, 2014

HI Am getting error in line 43. Could you please help me on this ?

ERROR: Encountered error (1) while running the following:
grep "name"
(at line 43 of file git1.sh.)

@blutrumpet

This comment has been minimized.

blutrumpet commented Aug 5, 2014

I too am getting the same error as @railsfactory-suriya " ./backup-github.sh

=== INITIALIZING ===

Using backup directory github-backups
Fetching list of repositories for BluTrumpetOrg...ERROR: Encountered error (1) while running the following:
grep "name"
(at line 43 of file ./backup-github.sh.)
Aborting.
found 0 repositories.
"
Please advise...

@robnagler

This comment has been minimized.

robnagler commented Aug 23, 2014

The "issues" backup is only the list of issues, not the content. I think you need something more sophisticated to traverse all the *_url entries for each comment, event, etc.

@magikid

This comment has been minimized.

magikid commented Sep 18, 2014

Thanks for writing this script!

I just wish that it worked with 2-factor auth.

@mtolly

This comment has been minimized.

mtolly commented Nov 16, 2014

The script breaks if you are a user who has access to another user's repository. For example if you are user A but you are a contributor to another user B's somerepo, the script will mistakenly try to download A/somerepo. This could be fixed by using the full_name instead of the name.

@stanstrup

This comment has been minimized.

stanstrup commented Nov 26, 2014

It only pulls open issues. Can be easily fixed. See here: https://gist.github.com/stanstrup/2725319cd18db7f863c0/revisions

But it doesn't seem to pull all issues. I cannot figure why. Any ideas?

@fstutzman

This comment has been minimized.

fstutzman commented Dec 22, 2014

@magikid The script does work with 2FA, you will need to generate an application-specific password for the script.

@mandric

This comment has been minimized.

mandric commented Mar 3, 2015

If your org has more than 30 repos you will probably want to add a ?per_page=100 arg to get the entire list, otherwise it seems github API defaults to 30 repos per page.

@andrewetter

This comment has been minimized.

andrewetter commented Mar 18, 2015

@blutrumpet @railsfactory-suriya You will get the grep "name" error if your GHBU_API string is incorrect, or has a trailing slash.

@spanthetree

This comment has been minimized.

spanthetree commented Apr 15, 2015

Along with ?per_page=100, if you have more than 100 repos, you need to add &page=N in order to grab them all.

However, you can only call 100 repos at a time from github, so you need a loop to grab different pages if you have more than 100 repos.

I forked this gist and added an until loop replacing line 43-68 of this script, which you can see here. Useful if you have more than 100 repos:

https://gist.github.com/forkaholic/f583667f97813b863171

@sureshghare

This comment has been minimized.

sureshghare commented Apr 30, 2015

I have more than100 repository in my organization. but script fetching 30 repository
Could you please help us to resolve this issue.

Thank you

@christisking

This comment has been minimized.

christisking commented May 14, 2015

Has anyone looked at importing issues/wiki back into github after they've been exported?
Thanks

@jok3ll

This comment has been minimized.

jok3ll commented Sep 5, 2015

Please me slot me expert in to jb name first jok3ll please slot

@thekeith

This comment has been minimized.

thekeith commented Sep 6, 2015

@railsfactory-suriya

You need to change the variables in the script on lines 5, 7 and 8 that are noted as :)

@kevashcraft

This comment has been minimized.

kevashcraft commented Nov 17, 2015

Create a personal access token (in settings) to use as a password if you have 2-step auth enabled

@tonylukasavage

This comment has been minimized.

tonylukasavage commented Dec 16, 2015

This worked great! Thanks!

@morjo02jm

This comment has been minimized.

morjo02jm commented Dec 22, 2015

Hello GitHub User Community, we have a large software organization and have 85% of our source code within GitHub. We perform daily backups using the GitHub backup utility and usually completes in 3-4 hours. Can anyone recommend a backup solution to achieve zero or close to zero data loss, for example, a backup solution that can perform continuous backup. Note, we do have a disaster recovery solution in place but its a backend (SAN) Storage Replication solution but if someone deletes the contents, these changes are replicated to our target. We could investigate SAN Storage Snapshots as a solution. I like to hear what other GitHub Admins are doing for local backup and recovery.

@cchorn

This comment has been minimized.

cchorn commented Dec 25, 2015

This was working for a very long time but now the script breaks at line 45 ...

(at line 45 of file backup-github.sh.)
Aborting.
ERROR: Encountered error (1) while running the following:
grep "name"
(at line 45 of file backup-github.sh.)
Aborting.

Any thoughts on how to fix this?

@dpflucas

This comment has been minimized.

dpflucas commented Feb 19, 2016

+1 on the grep name error:

ERROR: Encountered error (1) while running the following:
           grep "name"
       (at line 43 of file ./backup-github.sh.)
       Aborting.
@rodw

This comment has been minimized.

Owner

rodw commented Feb 22, 2016

@CChron @dpflucas - I haven't encountered this issue myself, but per https://gist.github.com/rodw/3073987#gistcomment-1415659 there may be something wrong with one or more of your GHBU_UNAME, GHBU_PASSWD, GHBU_API ro GHBU_ORG parameters (causing no input to the grep call, for example).

Manually running the equivalent of:

curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos -q

may give you a more easily digestible error message or expose a more obvious problem.

@rodw

This comment has been minimized.

Owner

rodw commented Feb 22, 2016

@CChron @dpflucas - More generally, if that curl command fails to generate output to STDOUT for any reason you might encounter an error in the grep part of that line (#43).

@Zeretil

This comment has been minimized.

Zeretil commented Apr 19, 2016

The script works perfectly, thanks. But I'm not really understanding what it is I'm downloading. In the GIT that is downloaded, there isn't much to be seen. Example included of what is in the Git. What am I missing here?

githubbackup

@ghost

This comment has been minimized.

ghost commented May 27, 2016

The script uses --mirror, which implies --bare. I was able to restore a working tree by following the instructions at http://stackoverflow.com/questions/12450245/getting-a-working-copy-of-a-bare-repository – specifically, by cloning locally with git clone /path/to/backup-dir.git.

@mschuwalow

This comment has been minimized.

mschuwalow commented Jul 5, 2016

A small change that makes it clone the repos of a user and all repos he's a contributor of. Doesn't matter who the owner is and whether they're private or not.
https://gist.github.com/MSchuwalow/d943477d2b3d33c8bf7b51e04515b3e6

@MichaelGofron

This comment has been minimized.

MichaelGofron commented Aug 4, 2016

Hello, I ran the command that @rodw mentioned and I got a list of all the repos in the organization but when I run the script it gets the same error message others got.

Notably:

Fetching list of repositories for ...ERROR: Encountered error (1) while running the following:
grep "name"
(at line 46 of file ./downloader.sh.)
Aborting.

@MichaelGofron

This comment has been minimized.

MichaelGofron commented Aug 4, 2016

Alright I figured it out. I had left the brackets in rather than removing them. After removing the brackets I had permission problems. So then I linked my ssh key to my github account as seen here: https://help.github.com/articles/generating-an-ssh-key/

@michael-dev2rights

This comment has been minimized.

michael-dev2rights commented Aug 17, 2016

Hi; I made some changes to this to make it pass shellcheck. @rodw, would you be able to merge these back into your gist?

https://gist.github.com/michael-dev2rights/2e3115f4d3e206937464eb84b7cb9aa6

@muthiahr

This comment has been minimized.

muthiahr commented Dec 7, 2016

Hi

curl --silent -u : https://api.github.com/orgs/appedo/repos returns the following

{
"message": "Bad credentials",
"documentation_url": "https://developer.github.com/v3"
}

But the credentials I am trying with is valid.

@asari-fd

This comment has been minimized.

asari-fd commented Dec 29, 2016

Thank you for the great script.
One problem: sometimes curl fails inside pipe (REPOLIST=curl|grep|awk|sed), but the script continues as if it's OK, for not checking PIPESTATUS.

@tdiprima

This comment has been minimized.

tdiprima commented Mar 3, 2017

Backing up a user's repositories (line 48) the url should actually say users (plural, just like orgs). I know. Seems weird. But it's true.

@eromerog

This comment has been minimized.

eromerog commented May 25, 2017

Hi!, I don't want to waste your time, but how does the code above work? It just runs by executing in a command line by changing the variables of my github account? If you know something about a tutorial or something like that it would be great :)

@tripu

This comment has been minimized.

tripu commented Dec 8, 2017

We think GH might have changed the API v3 recently a bit: now, when retrieving the list of repos of an organisation, detailed info about the repo's licence is included (whereas before that wasn't there, or at least not with so much detail). As a result, results now may include two instances of the string name: one for the name of the repo, another one for the name of the licence. grep would wrongly take both. Because of that, our particular version of this script has been failing for a few days now (we believe around 29 Nov 2017).

This fork fixes that (by using jq, which is available in Debian & Ubuntu repositories, instead of grep for this).

@tmichaud-accesso

This comment has been minimized.

tmichaud-accesso commented Dec 14, 2017

For those who don't wish to use jq (they should) - it's fairly easy to use a python script to handle this.

The line:

REPOLIST_TMP=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?page=${PAGE}\&per_page=90 -q -k | grep "\"name\"" | awk -F': "' '{print $2}' | sed -e 's/",//g'`

Needs to be modified to:

REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?page=100 -q  | ./processRepoList.py`

with processRepoList.py being:

#! /usr/bin/env python2

#Takes a JSON file (a list of objects) and pulls out only the 'name' key/value pair of each object - printing out the value

import json
import sys

data = json.load(sys.stdin)
for x in data:
	print x['name']
@marchenkov

This comment has been minimized.

marchenkov commented Dec 21, 2017

Also you can install jq and modify

REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?page=100 -q  | jq ".[] .name"|sed -e 's/\"//g'`
@PaperFixie

This comment has been minimized.

PaperFixie commented Jan 24, 2018

One thing that I noticed when attempting to backup our org's repos is that on line 46 check grep "\"name\"" was pulling the names of our licenses and was attemping to backup repos named Apache and MIT which didn't exist.

When we checked the output we noticed that grepping for the name pulled up the line for labels in the output.

Modifying line 46 to:

REPOLIST_TMP=`check curl --silent ${GHBU_API}/orgs/${GHBU_ORG}/repos\?${GHBU_APIOPTS}page=${PAGE}\&per_page=90 -q -k | grep "\"full_name\"" | awk -F': "' '{print $2}' | sed -e 's/",//g' | sed -e 's/<org-name>\///g'`

resolved those issues for us.

@cemenson

This comment has been minimized.

cemenson commented Apr 27, 2018

I had the same issue as @PaperFixie. Used your modification, but included some of my own to:

  • Include API authentication
  • Output repo names without org name (to keep the output the same as the original script)
REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?per_page=100 -q | check grep "\"full_name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g' | cut -d '/' -f 2`
@jalama

This comment has been minimized.

jalama commented Jul 3, 2018

@cemenson's last comment needed the following change to work for me
replace starting at line 56

   $GHBU_SILENT || echo "Backing up ${REPO}"
   check ${GHBU_GIT_CLONE_CMD}${REPO}.git ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git
@billweils

This comment has been minimized.

billweils commented Jul 13, 2018

Does anyone have insights on how to:
a) Validate the backup as including all branches, etc, etc,
b) Restore this backup to a device/location outside of GitHub and see the various Repo's and Branches in a CDE -- (Visual Studio)

@alyavasa

This comment has been minimized.

alyavasa commented Aug 27, 2018

ERROR: Encountered error (128) while running the following:
git clone --quiet --mirror git@github.com:<GHBU_ORG>/GNU.git github-backups/<GHBU_ORG>-GNU-1319_08/27/18082018.git
(at line 60 of file backup-github.sh.)
Aborting.

@SGlushakov

This comment has been minimized.

SGlushakov commented Sep 29, 2018

ERROR: Encountered error (128) while running the following:
git clone --quiet --mirror git@github.com:<GHBU_ORG>/GNU.git github-backups/<GHBU_ORG>-GNU-1319_08/27/18082018.git
(at line 60 of file backup-github.sh.)
Aborting.

Same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment