Skip to content

Instantly share code, notes, and snippets.

@clrung
Last active December 11, 2023 14:34
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save clrung/75459a9fe954313c57f69d6cdfd502ec to your computer and use it in GitHub Desktop.
Save clrung/75459a9fe954313c57f69d6cdfd502ec to your computer and use it in GitHub Desktop.
Clones all repos in a GitHub organization
#!/bin/bash
# Usage: clone_all_repos.sh [organization] <output directory>
ORG=$1
PER_PAGE=100
GIT_OUTPUT_DIRECTORY=${2:-"/tmp/${ORG}_repos"}
if [ -z "$GITHUB_TOKEN" ]; then
echo -e "Variable GITHUB_TOKEN isn't set! Please specify your GitHub token.\n\nMore info: https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/"
exit 1
fi
if [ -z "$ORG" ]; then
echo "Variable ORG isn't set! Please specify the GitHub organization."
exit 1
fi
mkdir -p $GIT_OUTPUT_DIRECTORY
echo "Cloning repos in $ORG to $GIT_OUTPUT_DIRECTORY/..."
for ((PAGE=1; ; PAGE+=1)); do
REPO_COUNT=0
ERROR=0
while read REPO_NAME ; do
((REPO_COUNT++))
echo -n "Cloning $REPO_NAME to $GIT_OUTPUT_DIRECTORY/$REPONAME... "
git clone https://github.com/$ORG/$REPO_NAME.git $GIT_OUTPUT_DIRECTORY/$REPO_NAME >/dev/null 2>&1 ||
{ echo -e "ERROR: Unable to clone!" ; ERROR=1 ; continue ; }
echo "done"
done < <(curl -u :$GITHUB_TOKEN -s "https://api.github.com/orgs/$ORG/repos?per_page=$PER_PAGE&page=$PAGE" | jq -r ".[]|.name")
if [ $ERROR -eq 1 ] ; then exit 1 ; fi
if [ $REPO_COUNT -ne $PER_PAGE ] ; then exit 0 ; fi
done
@clrung
Copy link
Author

clrung commented Jan 27, 2020

This script clones all repos in an organization. It iterates through each page in the repository list, when it parses the repository's name and feeds it into a loop, which clones the repo.

The script finishes successfully when the number of repos on the current page does not match the PER_PAGE variable, since this implies there were less than the maximum number of repos left to show. A consequence of this is that the script will never finish if the total number of repos in your organization is evenly divisible by PER_PAGE.

If there is an issue cloning, the script will output ERROR: Unable to clone! next to the problematic repo, and the script will halt before the next page of repos is fetched.

Arguments

  • Organization (required)
    • name of the GitHub organization whose repos you would like to clone
  • Output directory (optional; default: /tmp/${ORG}_repos/)
    • directory that will contain the cloned repos

Dependencies

  • $GITHUB_TOKEN env var set
    • Learn how to create your GitHub token here, and grant it access to repo > public repo
  • jq
    • brew install jq

To clone repos in a user account, please change https://api.github.com/orgs/ to https://api.github.com/users/ (thanks, @hotelzululima!)

TODO

  • Use more intelligent pagination to determine when we have reached the end of the list (doc).

@clrung
Copy link
Author

clrung commented Jan 28, 2020

@hotelzululima Hello! Sorry this didn't work with user repos. When I changed L26 to the following, it cloned your repos as expected.

done < <(curl -u :$GITHUB_TOKEN -s "https://api.github.com/users/$ORG/repos?per_page=$PER_PAGE&page=$PAGE" | jq -r ".[]|.name")

(I changed https://api.github.com/orgs/ -> https://api.github.com/users/)

$ ./clone_all_repos.sh hotelzululima archives
Cloning repos in hotelzululima to archives/...
starting to clone 0bin to archives/0bin... done
starting to clone 0x00sec_code to archives/0x00sec_code... done
starting to clone 3D-Printing to archives/3D-Printing... done
starting to clone 3DRSoloHacks to archives/3DRSoloHacks... done
starting to clone 3DRSoloHacks-1 to archives/3DRSoloHacks-1... done
starting to clone 3snake to archives/3snake... done
starting to clone a2sv to archives/a2sv... done
...

@hotelzululima
Copy link

hotelzululima commented Jan 28, 2020

and it stopped at 300 repos... :/ am I running into another API limitation?

hzl

@clrung
Copy link
Author

clrung commented Jan 28, 2020

@hotelzululima Sorry about that! It's working on my machine (if I had a dollar for every time I said that...).

$ ./clone_all_repos.sh hotelzululima archives
Cloning repos in hotelzululima to archives/...
Cloning 0bin to archives/... done
Cloning 0x00sec_code to archives/... done
Cloning 3D-Printing to archives/... done
...
Cloning echo-dot to archives/... done
Cloning ecu-tool to archives/... done
Cloning eda2 to archives/... done

The script currently cloned 320 repos (eda2 is #320), so I stopped the script's execution here.
Since you said it stopped at 300 repos, it must have had trouble fetching the next page for some reason. Try replacing the main while loop with this, which will print each repo's name rather than git clone it, a costly operation:

    while read REPO_NAME ; do
        ((REPO_COUNT++))
        echo "$REPO_COUNT: $REPO_NAME"
    done < <(curl -u :$GITHUB_TOKEN -s "https://api.github.com/users/$ORG/repos?per_page=$PER_PAGE&page=$PAGE" | jq -r ".[]|.name")

Execution should look like this:

$./clone_all_repos.sh hotelzululima archives
Cloning repos in hotelzululima to archives/...
1: 0bin
2: 0x00sec_code
...
20: eda2

$REPO_COUNT resets after each page, which is why we see the repo count go from 1 to 100 and then loop back to 1.

One thing that comes to mind is that your GitHub token faced some sort of rate limiting. You can perform this curl to see if that's the case:

$ curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit

@hotelzululima
Copy link

On 1/28/20 12:30 PM, Christopher Rung wrote:

|url -H "Authorization: token $GITHUB_TOKEN"
https://api.github.com/rate_limit|
curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit
{
"resources": {
"core": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
},
"search": {
"limit": 30,
"remaining": 30,
"reset": 1580244535
},
"graphql": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
},
"integration_manifest": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
},
"source_import": {
"limit": 100,
"remaining": 100,
"reset": 1580244535
}
},
"rate": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
}
}

@clrung
Copy link
Author

clrung commented Jan 28, 2020

Doesn't look like you've been rate limited (remaining == limit above).

Did you try simply echoing the repos rather than clone? Maybe try the modified while loop I posted earlier.

@hotelzululima
Copy link

hotelzululima commented Feb 1, 2020

echoing the repos made it all the way through..
cloning them only gets to 100 repos...

  sigh
  hzl

ps sorry about the disconnect earlier.. had to rescue a clients network..

@hotelzululima
Copy link

hotelzululima commented Feb 2, 2020

turns out when running bash -vx the control flow differed between the count version and the clone version.. and the issue was
{ echo -e "ERROR: Unable to clone!" ; ERROR=1 ; continue ; }

combined with:
if [ $ERROR -eq 1 ] ; then exit 1 ; fi

ended the clone op early.. by changing the line to read :

{ echo -e "ERROR: Unable to clone!" ; ERROR=2 ; continue ; }

and depending on:
if [ $REPO_COUNT -ne $PER_PAGE ] ; then exit 0 ; fi
to break out of the git clone loop the clone all repos succeeds(least its still running...)

hzl

@hotelzululima
Copy link

hotelzululima commented Feb 2, 2020

thanx for helping me to safeguard my "bookmarks" ..(my precious)
now since I omitted doing a git remote origin on most of these I am wondering on how to scrape it out of github(have to start reading
that api documentation next :) and add it to all the repos being git cloned..
hzl

@aselunar
Copy link

This script is awesome. Do you guys mind if I redistribute the gist as part of a repo for bulk cloning git repos (under MIT license)?

@banerRana
Copy link

have you tried something like this
curl -s https://api.github.com/orgs/blockchain-etl/repos  | jq -r ".[].clone_url" | xargs -L1 git clone

as you already have jq installed

@clrung
Copy link
Author

clrung commented May 5, 2023

Do you guys mind if I redistribute the gist as part of a repo for bulk cloning git repos (under MIT license)?

Hello @aselunar, sorry for the delay and I appreciate the compliment! Yes, that is fine with me - go ahead and use this as you wish.

@clrung
Copy link
Author

clrung commented May 5, 2023

@banerRana, yes, that will work for any org that has less than 100 repos. This script handles pagination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment