Skip to content

Instantly share code, notes, and snippets.

@pascalschulz
Last active November 12, 2020 13:42
  • Star 3 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
Star You must be signed in to star a gist
Embed
What would you like to do?
This code snippet takes a Github organization name as input, crawls for all its public repositories and returns a list of all the "Git clone URLs" for those repos.
import itertools
import re
import requests as rq
# Your Github organization (e.g. /Github)
organization = "/<company_name>"
response = rq.request("GET", "https://github.com{0}".format(organization))
try:
pages = re.search(r"data-total-pages=\"(\d+)\">", response.text).group(1)
except:
pages = 1
repositoryUrls = []
for page in range(1, int(pages) + 1):
response = rq.request("GET", "https://github.com{}?page={}".format(organization, str(page)))
repositoryUrls.append(re.findall(r"itemprop=\"name codeRepository\".*href=\"" + path + "/(.*)\" class", response.text))
repositoryUrls = list(itertools.chain.from_iterable(repositoryUrls))
repositoryUrls = ["https://github.com" + organization + "/{0}.git".format(repo) for repo in repositoryUrls]
print(repositoryUrls)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment