Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Clones all repos in a GitHub organization
#!/bin/bash
# Usage: clone_all_repos.sh [organization] <output directory>
ORG=$1
PER_PAGE=100
GIT_OUTPUT_DIRECTORY=${2:-"/tmp/${ORG}_repos"}
if [ -z "$GITHUB_TOKEN" ]; then
echo -e "Variable GITHUB_TOKEN isn't set! Please specify your GitHub token.\n\nMore info: https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/"
exit 1
fi
if [ -z "$ORG" ]; then
echo "Variable ORG isn't set! Please specify the GitHub organization."
exit 1
fi
mkdir -p $GIT_OUTPUT_DIRECTORY
echo "Cloning repos in $ORG to $GIT_OUTPUT_DIRECTORY/..."
for ((PAGE=1; ; PAGE+=1)); do
REPO_COUNT=0
ERROR=0
while read REPO_NAME ; do
((REPO_COUNT++))
echo -n "Cloning $REPO_NAME to $GIT_OUTPUT_DIRECTORY/$REPONAME... "
git clone https://github.com/$ORG/$REPO_NAME.git $GIT_OUTPUT_DIRECTORY/$REPO_NAME >/dev/null 2>&1 ||
{ echo -e "ERROR: Unable to clone!" ; ERROR=1 ; continue ; }
echo "done"
done < <(curl -u :$GITHUB_TOKEN -s "https://api.github.com/orgs/$ORG/repos?per_page=$PER_PAGE&page=$PAGE" | jq -r ".[]|.name")
if [ $ERROR -eq 1 ] ; then exit 1 ; fi
if [ $REPO_COUNT -ne $PER_PAGE ] ; then exit 0 ; fi
done
@clrung

This comment has been minimized.

Copy link
Owner Author

@clrung clrung commented Jan 27, 2020

This script clones all repos in an organization. It iterates through each page in the repository list, when it parses the repository's name and feeds it into a loop, which clones the repo.

The script finishes successfully when the number of repos on the current page does not match the PER_PAGE variable, since this implies there were less than the maximum number of repos left to show. A consequence of this is that the script will never finish if the total number of repos in your organization is evenly divisible by PER_PAGE.

If there is an issue cloning, the script will output ERROR: Unable to clone! next to the problematic repo, and the script will halt before the next page of repos is fetched.

Arguments

  • Organization (required)
    • name of the GitHub organization whose repos you would like to clone
  • Output directory (optional; default: /tmp/${ORG}_repos/)
    • directory that will contain the cloned repos

Dependencies

  • $GITHUB_TOKEN env var set
    • Learn how to create your GitHub token here, and grant it access to repo > public repo
  • jq
    • brew install jq

To clone repos in a user account, please change https://api.github.com/orgs/ to https://api.github.com/users/ (thanks, @hotelzululima!)

TODO

  • Use more intelligent pagination to determine when we have reached the end of the list (doc).
@clrung

This comment has been minimized.

Copy link
Owner Author

@clrung clrung commented Jan 28, 2020

@hotelzululima Hello! Sorry this didn't work with user repos. When I changed L26 to the following, it cloned your repos as expected.

done < <(curl -u :$GITHUB_TOKEN -s "https://api.github.com/users/$ORG/repos?per_page=$PER_PAGE&page=$PAGE" | jq -r ".[]|.name")

(I changed https://api.github.com/orgs/ -> https://api.github.com/users/)

$ ./clone_all_repos.sh hotelzululima archives
Cloning repos in hotelzululima to archives/...
starting to clone 0bin to archives/0bin... done
starting to clone 0x00sec_code to archives/0x00sec_code... done
starting to clone 3D-Printing to archives/3D-Printing... done
starting to clone 3DRSoloHacks to archives/3DRSoloHacks... done
starting to clone 3DRSoloHacks-1 to archives/3DRSoloHacks-1... done
starting to clone 3snake to archives/3snake... done
starting to clone a2sv to archives/a2sv... done
...
@hotelzululima

This comment has been minimized.

Copy link

@hotelzululima hotelzululima commented Jan 28, 2020

and it stopped at 300 repos... :/ am I running into another API limitation?

hzl
@clrung

This comment has been minimized.

Copy link
Owner Author

@clrung clrung commented Jan 28, 2020

@hotelzululima Sorry about that! It's working on my machine (if I had a dollar for every time I said that...).

$ ./clone_all_repos.sh hotelzululima archives
Cloning repos in hotelzululima to archives/...
Cloning 0bin to archives/... done
Cloning 0x00sec_code to archives/... done
Cloning 3D-Printing to archives/... done
...
Cloning echo-dot to archives/... done
Cloning ecu-tool to archives/... done
Cloning eda2 to archives/... done

The script currently cloned 320 repos (eda2 is #320), so I stopped the script's execution here.
Since you said it stopped at 300 repos, it must have had trouble fetching the next page for some reason. Try replacing the main while loop with this, which will print each repo's name rather than git clone it, a costly operation:

    while read REPO_NAME ; do
        ((REPO_COUNT++))
        echo "$REPO_COUNT: $REPO_NAME"
    done < <(curl -u :$GITHUB_TOKEN -s "https://api.github.com/users/$ORG/repos?per_page=$PER_PAGE&page=$PAGE" | jq -r ".[]|.name")

Execution should look like this:

$./clone_all_repos.sh hotelzululima archives
Cloning repos in hotelzululima to archives/...
1: 0bin
2: 0x00sec_code
...
20: eda2

$REPO_COUNT resets after each page, which is why we see the repo count go from 1 to 100 and then loop back to 1.

One thing that comes to mind is that your GitHub token faced some sort of rate limiting. You can perform this curl to see if that's the case:

$ curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit
@hotelzululima

This comment has been minimized.

Copy link

@hotelzululima hotelzululima commented Jan 28, 2020

On 1/28/20 12:30 PM, Christopher Rung wrote:

|url -H "Authorization: token $GITHUB_TOKEN"
https://api.github.com/rate_limit|
curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit
{
"resources": {
"core": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
},
"search": {
"limit": 30,
"remaining": 30,
"reset": 1580244535
},
"graphql": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
},
"integration_manifest": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
},
"source_import": {
"limit": 100,
"remaining": 100,
"reset": 1580244535
}
},
"rate": {
"limit": 5000,
"remaining": 5000,
"reset": 1580248075
}
}

@clrung

This comment has been minimized.

Copy link
Owner Author

@clrung clrung commented Jan 28, 2020

Doesn't look like you've been rate limited (remaining == limit above).

Did you try simply echoing the repos rather than clone? Maybe try the modified while loop I posted earlier.

@hotelzululima

This comment has been minimized.

Copy link

@hotelzululima hotelzululima commented Feb 1, 2020

echoing the repos made it all the way through..
cloning them only gets to 100 repos...

  sigh
  hzl

ps sorry about the disconnect earlier.. had to rescue a clients network..

@hotelzululima

This comment has been minimized.

Copy link

@hotelzululima hotelzululima commented Feb 2, 2020

turns out when running bash -vx the control flow differed between the count version and the clone version.. and the issue was
{ echo -e "ERROR: Unable to clone!" ; ERROR=1 ; continue ; }

combined with:
if [ $ERROR -eq 1 ] ; then exit 1 ; fi

ended the clone op early.. by changing the line to read :

{ echo -e "ERROR: Unable to clone!" ; ERROR=2 ; continue ; }

and depending on:
if [ $REPO_COUNT -ne $PER_PAGE ] ; then exit 0 ; fi
to break out of the git clone loop the clone all repos succeeds(least its still running...)

hzl
@hotelzululima

This comment has been minimized.

Copy link

@hotelzululima hotelzululima commented Feb 2, 2020

thanx for helping me to safeguard my "bookmarks" ..(my precious)
now since I omitted doing a git remote origin on most of these I am wondering on how to scrape it out of github(have to start reading
that api documentation next :) and add it to all the repos being git cloned..
hzl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment