Skip to content

Instantly share code, notes, and snippets.

@clrung
Last active December 11, 2023 14:34
Show Gist options
  • Save clrung/75459a9fe954313c57f69d6cdfd502ec to your computer and use it in GitHub Desktop.
Save clrung/75459a9fe954313c57f69d6cdfd502ec to your computer and use it in GitHub Desktop.
Clones all repos in a GitHub organization
#!/bin/bash
# Usage: clone_all_repos.sh [organization] <output directory>
ORG=$1
PER_PAGE=100
GIT_OUTPUT_DIRECTORY=${2:-"/tmp/${ORG}_repos"}
if [ -z "$GITHUB_TOKEN" ]; then
echo -e "Variable GITHUB_TOKEN isn't set! Please specify your GitHub token.\n\nMore info: https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/"
exit 1
fi
if [ -z "$ORG" ]; then
echo "Variable ORG isn't set! Please specify the GitHub organization."
exit 1
fi
mkdir -p $GIT_OUTPUT_DIRECTORY
echo "Cloning repos in $ORG to $GIT_OUTPUT_DIRECTORY/..."
for ((PAGE=1; ; PAGE+=1)); do
REPO_COUNT=0
ERROR=0
while read REPO_NAME ; do
((REPO_COUNT++))
echo -n "Cloning $REPO_NAME to $GIT_OUTPUT_DIRECTORY/$REPONAME... "
git clone https://github.com/$ORG/$REPO_NAME.git $GIT_OUTPUT_DIRECTORY/$REPO_NAME >/dev/null 2>&1 ||
{ echo -e "ERROR: Unable to clone!" ; ERROR=1 ; continue ; }
echo "done"
done < <(curl -u :$GITHUB_TOKEN -s "https://api.github.com/orgs/$ORG/repos?per_page=$PER_PAGE&page=$PAGE" | jq -r ".[]|.name")
if [ $ERROR -eq 1 ] ; then exit 1 ; fi
if [ $REPO_COUNT -ne $PER_PAGE ] ; then exit 0 ; fi
done
@clrung
Copy link
Author

clrung commented Jan 28, 2020

@hotelzululima Sorry about that! It's working on my machine (if I had a dollar for every time I said that...).

$ ./clone_all_repos.sh hotelzululima archives
Cloning repos in hotelzululima to archives/...
Cloning 0bin to archives/... done
Cloning 0x00sec_code to archives/... done
Cloning 3D-Printing to archives/... done
...
Cloning echo-dot to archives/... done
Cloning ecu-tool to archives/... done
Cloning eda2 to archives/... done

The script currently cloned 320 repos (eda2 is #320), so I stopped the script's execution here.
Since you said it stopped at 300 repos, it must have had trouble fetching the next page for some reason. Try replacing the main while loop with this, which will print each repo's name rather than git clone it, a costly operation:

    while read REPO_NAME ; do
        ((REPO_COUNT++))
        echo "$REPO_COUNT: $REPO_NAME"
    done < <(curl -u :$GITHUB_TOKEN -s "https://api.github.com/users/$ORG/repos?per_page=$PER_PAGE&page=$PAGE" | jq -r ".[]|.name")

Execution should look like this:

$./clone_all_repos.sh hotelzululima archives
Cloning repos in hotelzululima to archives/...
1: 0bin
2: 0x00sec_code
...
20: eda2

$REPO_COUNT resets after each page, which is why we see the repo count go from 1 to 100 and then loop back to 1.

One thing that comes to mind is that your GitHub token faced some sort of rate limiting. You can perform this curl to see if that's the case:

$ curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit

@hotelzululima
Copy link

On 1/28/20 12:30 PM, Christopher Rung wrote:

|url -H "Authorization: token $GITHUB_TOKEN"
https://api.github.com/rate_limit|
curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit
{
"resources": {
"core": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
},
"search": {
"limit": 30,
"remaining": 30,
"reset": 1580244535
},
"graphql": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
},
"integration_manifest": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
},
"source_import": {
"limit": 100,
"remaining": 100,
"reset": 1580244535
}
},
"rate": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
}
}

@clrung
Copy link
Author

clrung commented Jan 28, 2020

Doesn't look like you've been rate limited (remaining == limit above).

Did you try simply echoing the repos rather than clone? Maybe try the modified while loop I posted earlier.

@hotelzululima
Copy link

hotelzululima commented Feb 1, 2020

echoing the repos made it all the way through..
cloning them only gets to 100 repos...

  sigh
  hzl

ps sorry about the disconnect earlier.. had to rescue a clients network..

@hotelzululima
Copy link

hotelzululima commented Feb 2, 2020

turns out when running bash -vx the control flow differed between the count version and the clone version.. and the issue was
{ echo -e "ERROR: Unable to clone!" ; ERROR=1 ; continue ; }

combined with:
if [ $ERROR -eq 1 ] ; then exit 1 ; fi

ended the clone op early.. by changing the line to read :

{ echo -e "ERROR: Unable to clone!" ; ERROR=2 ; continue ; }

and depending on:
if [ $REPO_COUNT -ne $PER_PAGE ] ; then exit 0 ; fi
to break out of the git clone loop the clone all repos succeeds(least its still running...)

hzl

@hotelzululima
Copy link

hotelzululima commented Feb 2, 2020

thanx for helping me to safeguard my "bookmarks" ..(my precious)
now since I omitted doing a git remote origin on most of these I am wondering on how to scrape it out of github(have to start reading
that api documentation next :) and add it to all the repos being git cloned..
hzl

@aselunar
Copy link

This script is awesome. Do you guys mind if I redistribute the gist as part of a repo for bulk cloning git repos (under MIT license)?

@banerRana
Copy link

have you tried something like this
curl -s https://api.github.com/orgs/blockchain-etl/repos  | jq -r ".[].clone_url" | xargs -L1 git clone

as you already have jq installed

@clrung
Copy link
Author

clrung commented May 5, 2023

Do you guys mind if I redistribute the gist as part of a repo for bulk cloning git repos (under MIT license)?

Hello @aselunar, sorry for the delay and I appreciate the compliment! Yes, that is fine with me - go ahead and use this as you wish.

@clrung
Copy link
Author

clrung commented May 5, 2023

@banerRana, yes, that will work for any org that has less than 100 repos. This script handles pagination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment