Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
A simple script to backup an organization's GitHub repositories, wikis and issues.
#!/bin/bash
# A simple script to backup an organization's GitHub repositories.
# NOTE: if you have more than 100 repositories, you'll need to step thru the list of repos
# returned by GitHub one page at a time, as described at https://gist.github.com/darktim/5582423
GHBU_BACKUP_DIR=${GHBU_BACKUP_DIR-"github-backups"} # where to place the backup files
GHBU_ORG=${GHBU_ORG-"<CHANGE-ME>"} # the GitHub organization whose repos will be backed up
# (if you're backing up a user's repos instead, this should be your GitHub username)
GHBU_UNAME=${GHBU_UNAME-"<CHANGE-ME>"} # the username of a GitHub account (to use with the GitHub API)
GHBU_PASSWD=${GHBU_PASSWD-"<CHANGE-ME>"} # the password for that account
GHBU_GITHOST=${GHBU_GITHOST-"github.com"} # the GitHub hostname (see comments)
GHBU_PRUNE_OLD=${GHBU_PRUNE_OLD-true} # when `true`, old backups will be deleted
GHBU_PRUNE_AFTER_N_DAYS=${GHBU_PRUNE_AFTER_N_DAYS-3} # the min age (in days) of backup files to delete
GHBU_SILENT=${GHBU_SILENT-false} # when `true`, only show error messages
GHBU_API=${GHBU_API-"https://api.github.com"} # base URI for the GitHub API
GHBU_GIT_CLONE_CMD="git clone --quiet --mirror git@${GHBU_GITHOST}:" # base command to use to clone GitHub repos
TSTAMP=`date "+%Y%m%d-%H%M"`
# The function `check` will exit the script if the given command fails.
function check {
"$@"
status=$?
if [ $status -ne 0 ]; then
echo "ERROR: Encountered error (${status}) while running the following:" >&2
echo " $@" >&2
echo " (at line ${BASH_LINENO[0]} of file $0.)" >&2
echo " Aborting." >&2
exit $status
fi
}
# The function `tgz` will create a gzipped tar archive of the specified file ($1) and then remove the original
function tgz {
check tar zcf $1.tar.gz $1 && check rm -rf $1
}
$GHBU_SILENT || (echo "" && echo "=== INITIALIZING ===" && echo "")
$GHBU_SILENT || echo "Using backup directory $GHBU_BACKUP_DIR"
check mkdir -p $GHBU_BACKUP_DIR
$GHBU_SILENT || echo -n "Fetching list of repositories for ${GHBU_ORG}..."
REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos\?per_page=100 -q | check grep "\"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
# NOTE: if you're backing up a *user's* repos, not an organizations, use this instead:
# REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/user/repos -q | check grep "\"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
$GHBU_SILENT || echo "found `echo $REPOLIST | wc -w` repositories."
$GHBU_SILENT || (echo "" && echo "=== BACKING UP ===" && echo "")
for REPO in $REPOLIST; do
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO}"
check ${GHBU_GIT_CLONE_CMD}${GHBU_ORG}/${REPO}.git ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}-${TSTAMP}.git
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO}.wiki (if any)"
${GHBU_GIT_CLONE_CMD}${GHBU_ORG}/${REPO}.wiki.git ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.wiki-${TSTAMP}.git 2>/dev/null && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.wiki-${TSTAMP}.git
$GHBU_SILENT || echo "Backing up ${GHBU_ORG}/${REPO} issues"
check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/repos/${GHBU_ORG}/${REPO}/issues -q > ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.issues-${TSTAMP} && tgz ${GHBU_BACKUP_DIR}/${GHBU_ORG}-${REPO}.issues-${TSTAMP}
done
if $GHBU_PRUNE_OLD; then
$GHBU_SILENT || (echo "" && echo "=== PRUNING ===" && echo "")
$GHBU_SILENT || echo "Pruning backup files ${GHBU_PRUNE_AFTER_N_DAYS} days old or older."
$GHBU_SILENT || echo "Found `find $GHBU_BACKUP_DIR -name '*.tar.gz' -mtime +$GHBU_PRUNE_AFTER_N_DAYS | wc -l` files to prune."
find $GHBU_BACKUP_DIR -name '*.tar.gz' -mtime +$GHBU_PRUNE_AFTER_N_DAYS -exec rm -fv {} > /dev/null \;
fi
$GHBU_SILENT || (echo "" && echo "=== DONE ===" && echo "")
$GHBU_SILENT || (echo "GitHub backup completed." && echo "")

marinho commented Sep 9, 2013

Hi, well done, this is going to be useful for me :)

Can you just tell me where are the notes about the GitHub's hostname, please?

Thank you!

Owner

rodw commented Oct 10, 2013

Sorry @marinho. That's a little cut-and-paste error that referenced a private wiki.

In the general case you can just use github.com as the host name.

The note that is referenced describes a way to run this backup script under a different set of credentials than one's normal GitHub account. Here's the relevant snippet:

If you want to use more than one GitHub account (e.g., your own account as well as the read-only back-up account), add the following to ~/.ssh/config (creating that file if needed):

Host <BACKUP>.github.com
    HostName github.com
    PreferredAuthentications publickey
    IdentityFile ~/.ssh/<THE-BACKUP-SSH-KEY>

(Where <BACKUP>.github.com is an arbitrary host name, but the same as the value used in the script and ~/.ssh/<THE-BACKUP-SSH-KEY> an ssh key generated with ssh-keygen and uploaded to GitHub.)

You can then login via:

ssh-add ~/.ssh/<THE-BACKUP-SSH-KEY>

and run the backup script as a cron job.

Calrion commented Oct 14, 2013

Firstly, thanks for this, it's a great help!

For those who, like me, want to backup user repositories rather than organisation repositories, the following small changes are required:

  • Enter your GitHub username as the value of both GHBU_UNAME and GHBU_ORG.

  • Change line 41 to read:

    REPOLIST=`check curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/user/repos -q | check grep "\"name\"" | check awk -F': "' '{print $2}' | check sed -e 's/",//g'`
    

Even though you remove the GHBU_ORG reference from that line, it's used later on to compute the full repository path so it's still needed (and it needs to be your username, as above).

With those changes, this script grabbed all my repositories—public, private, and forked—and made backups of the repository, the wiki, and the issues. Great work! 😄

bjtitus commented Feb 6, 2014

I made a few changes to support the paginated repos list since we have more than 100 repositories (the maximum allowed in a single page of the API) https://gist.github.com/bjtitus/8851816#file-backup-github-sh-L42-L49

reggi commented Feb 18, 2014

Took me a long while to realize that I just wanted GHBU_GITHOST=${GHBU_GITHOST-"github.com"} which should be the default!! >.<

Owner

rodw commented Mar 13, 2014

@reggl: Good call. I made that change

Owner

rodw commented Mar 13, 2014

@Calrion Thanks, I added comments describing your changes.

One could probably parameterize the script a bit to support both without "manual" intervention.

I need to put -i flag to be able my password be accepted, I don't know why.

Secondly, what to do with the files downloaded, is it the content of .git directory ?
How do github deal with repo, issu, wiki. Is it some branches ?
How to restore the files instead of a bare repo ?

At least I've found why my password wasn't working as it is containing a special char, that needed to be escaped with \

Finally, I've adopted anonother solution, as it was convenient for me to move my issues to bitbucket.
So if you want to do so, instead of just backuping github.
Here is 3 links
https://confluence.atlassian.com/pages/viewpage.action?pageId=330796872
http://stackoverflow.com/questions/11119270/how-to-import-github-issues-and-wikis-to-bitbucket
https://github.com/sorich87/github-to-bitbucket-issues-migration

HI Am getting error in line 43. Could you please help me on this ?

ERROR: Encountered error (1) while running the following:
grep "name"
(at line 43 of file git1.sh.)

I too am getting the same error as @railsfactory-suriya " ./backup-github.sh

=== INITIALIZING ===

Using backup directory github-backups
Fetching list of repositories for BluTrumpetOrg...ERROR: Encountered error (1) while running the following:
grep "name"
(at line 43 of file ./backup-github.sh.)
Aborting.
found 0 repositories.
"
Please advise...

The "issues" backup is only the list of issues, not the content. I think you need something more sophisticated to traverse all the *_url entries for each comment, event, etc.

magikid commented Sep 18, 2014

Thanks for writing this script!

I just wish that it worked with 2-factor auth.

mtolly commented Nov 16, 2014

The script breaks if you are a user who has access to another user's repository. For example if you are user A but you are a contributor to another user B's somerepo, the script will mistakenly try to download A/somerepo. This could be fixed by using the full_name instead of the name.

It only pulls open issues. Can be easily fixed. See here: https://gist.github.com/stanstrup/2725319cd18db7f863c0/revisions

But it doesn't seem to pull all issues. I cannot figure why. Any ideas?

@magikid The script does work with 2FA, you will need to generate an application-specific password for the script.

mandric commented Mar 3, 2015

If your org has more than 30 repos you will probably want to add a ?per_page=100 arg to get the entire list, otherwise it seems github API defaults to 30 repos per page.

@blutrumpet @railsfactory-suriya You will get the grep "name" error if your GHBU_API string is incorrect, or has a trailing slash.

Along with ?per_page=100, if you have more than 100 repos, you need to add &page=N in order to grab them all.

However, you can only call 100 repos at a time from github, so you need a loop to grab different pages if you have more than 100 repos.

I forked this gist and added an until loop replacing line 43-68 of this script, which you can see here. Useful if you have more than 100 repos:

https://gist.github.com/forkaholic/f583667f97813b863171

I have more than100 repository in my organization. but script fetching 30 repository
Could you please help us to resolve this issue.

Thank you

Has anyone looked at importing issues/wiki back into github after they've been exported?
Thanks

jok3ll commented Sep 5, 2015

Please me slot me expert in to jb name first jok3ll please slot

thekeith commented Sep 6, 2015

@railsfactory-suriya

You need to change the variables in the script on lines 5, 7 and 8 that are noted as :)

Create a personal access token (in settings) to use as a password if you have 2-step auth enabled

This worked great! Thanks!

Hello GitHub User Community, we have a large software organization and have 85% of our source code within GitHub. We perform daily backups using the GitHub backup utility and usually completes in 3-4 hours. Can anyone recommend a backup solution to achieve zero or close to zero data loss, for example, a backup solution that can perform continuous backup. Note, we do have a disaster recovery solution in place but its a backend (SAN) Storage Replication solution but if someone deletes the contents, these changes are replicated to our target. We could investigate SAN Storage Snapshots as a solution. I like to hear what other GitHub Admins are doing for local backup and recovery.

cchorn commented Dec 25, 2015

This was working for a very long time but now the script breaks at line 45 ...

(at line 45 of file backup-github.sh.)
Aborting.
ERROR: Encountered error (1) while running the following:
grep "name"
(at line 45 of file backup-github.sh.)
Aborting.

Any thoughts on how to fix this?

+1 on the grep name error:

ERROR: Encountered error (1) while running the following:
           grep "name"
       (at line 43 of file ./backup-github.sh.)
       Aborting.
Owner

rodw commented Feb 22, 2016

@CChron @dpflucas - I haven't encountered this issue myself, but per https://gist.github.com/rodw/3073987#gistcomment-1415659 there may be something wrong with one or more of your GHBU_UNAME, GHBU_PASSWD, GHBU_API ro GHBU_ORG parameters (causing no input to the grep call, for example).

Manually running the equivalent of:

curl --silent -u $GHBU_UNAME:$GHBU_PASSWD ${GHBU_API}/orgs/${GHBU_ORG}/repos -q

may give you a more easily digestible error message or expose a more obvious problem.

Owner

rodw commented Feb 22, 2016

@CChron @dpflucas - More generally, if that curl command fails to generate output to STDOUT for any reason you might encounter an error in the grep part of that line (#43).

Zeretil commented Apr 19, 2016

The script works perfectly, thanks. But I'm not really understanding what it is I'm downloading. In the GIT that is downloaded, there isn't much to be seen. Example included of what is in the Git. What am I missing here?

githubbackup

The script uses --mirror, which implies --bare. I was able to restore a working tree by following the instructions at http://stackoverflow.com/questions/12450245/getting-a-working-copy-of-a-bare-repository – specifically, by cloning locally with git clone /path/to/backup-dir.git.

A small change that makes it clone the repos of a user and all repos he's a contributor of. Doesn't matter who the owner is and whether they're private or not.
https://gist.github.com/MSchuwalow/d943477d2b3d33c8bf7b51e04515b3e6

Hello, I ran the command that @rodw mentioned and I got a list of all the repos in the organization but when I run the script it gets the same error message others got.

Notably:

Fetching list of repositories for ...ERROR: Encountered error (1) while running the following:
grep "name"
(at line 46 of file ./downloader.sh.)
Aborting.

Alright I figured it out. I had left the brackets in rather than removing them. After removing the brackets I had permission problems. So then I linked my ssh key to my github account as seen here: https://help.github.com/articles/generating-an-ssh-key/

michael-dev2rights commented Aug 17, 2016

Hi; I made some changes to this to make it pass shellcheck. @rodw, would you be able to merge these back into your gist?

https://gist.github.com/michael-dev2rights/2e3115f4d3e206937464eb84b7cb9aa6

muthiahr commented Dec 7, 2016

Hi

curl --silent -u : https://api.github.com/orgs/appedo/repos returns the following

{
"message": "Bad credentials",
"documentation_url": "https://developer.github.com/v3"
}

But the credentials I am trying with is valid.

Thank you for the great script.
One problem: sometimes curl fails inside pipe (REPOLIST=curl|grep|awk|sed), but the script continues as if it's OK, for not checking PIPESTATUS.

tdiprima commented Mar 3, 2017

Backing up a user's repositories (line 48) the url should actually say users (plural, just like orgs). I know. Seems weird. But it's true.

eromerog commented May 25, 2017

Hi!, I don't want to waste your time, but how does the code above work? It just runs by executing in a command line by changing the variables of my github account? If you know something about a tutorial or something like that it would be great :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment