Skip to content

Instantly share code, notes, and snippets.

@mrunkel
Last active April 2, 2020 15:25
Show Gist options
  • Save mrunkel/59ae2e77e31a724eb4bf810668a2cea7 to your computer and use it in GitHub Desktop.
Save mrunkel/59ae2e77e31a724eb4bf810668a2cea7 to your computer and use it in GitHub Desktop.
I mirror my Github repositories to Gitea, stolen from Jan-Piet Mens (because his site is down)

After backing up all my gists and cloning all my starred repositories there is one more thing I want to accomplish: backup my Github repositories, and by that I really mean the ones I manage and have commit rights to. I could do this by cloning and periodically pulling (as we discussed here), but you might have noticed that I explicitly exclude my own repositories in that script by checking for repo.owner.login. The reason is: I want to mirror them into Gitea.

Why Gitea? Untypically, I’d like a Web UI onto these repositories in addition to the files in the file system. It could have been Gitlab, but I think Gitea is probably the option with the lowest resource requirements.

When I add a repository to Gitea and specify I want it to be mirrored, Gitea will take charge of periodically querying the source repository and pulling changes in it. I’ve mentioned Gitea previously, and I find it’s improving as it matures. I’ve been doing this with version 1.7.5.

After setting up Gitea and creating a user, I create an API token in Gitea with which I can create repositories programatically. The following program will obtain a list of all Github repositories I have, skip those I’ve forked from elsewhere, and then create the repository in Gitea.

#!/usr/bin/env python -B

from github import Github		# https://github.com/PyGithub/PyGithub
import requests
import json
import sys
import os

gitea_url = "http://127.0.0.1:3000/api/v1"
gitea_token = open(os.path.expanduser("~/.gitea-api")).read().strip()

session = requests.Session()        # Gitea
session.headers.update({
    "Content-type"  : "application/json",
    "Authorization" : "token {0}".format(gitea_token),
})

r = session.get("{0}/user".format(gitea_url))
if r.status_code != 200:
    print("Cannot get user details", file=sys.stderr)
    exit(1)

gitea_uid = json.loads(r.text)["id"]

github_username = "jpmens"
github_token = open(os.path.expanduser("~/.github-token")).read().strip()
gh = Github(github_token)

for repo in gh.get_user().get_repos():
    # Mirror to Gitea if I haven't forked this repository from elsewhere
    if not repo.fork:
        m = {
            "repo_name"         : repo.full_name.replace("/", "-"),
            "description"       : repo.description or "not really known",
            "clone_addr"        : repo.clone_url,
            "mirror"            : True,
            "private"           : repo.private,
            "uid"               : gitea_uid,
        }

        if repo.private:
            m["auth_username"]  = github_username
            m["auth_password"]  = "{0}".format(github_token)

        jsonstring = json.dumps(m)

        r = session.post("{0}/repos/migrate".format(gitea_url), data=jsonstring)
        if r.status_code != 201:            # if not CREATED
            if r.status_code == 409:        # repository exists
                continue
            print(r.status_code, r.text, jsonstring)

You’ll notice that I handle private Github repositories specifically in that I add username and Github token to the Gitea mirror request. While I could do that as a matter of course, the username/token tuple is stored in Gitea and is, unfortunately, displayed in the Clone from URL field when you view the mirror properties in the UI. For this reason, I limit specifying the Github repository authorization to repos which actually require it.

Gitea stores clones of the repositories it mirrors in a directory I specify when setting it up (the ROOT key in the [repository] section of app.ini), so I could access the repositories from that if something goes wrong with Gitea:

$ git clone http://localhost:3000/jpm/jpmens-jo.git

...

$ tree -d /path/to/gitea-repositories/jpm/jpmens-jo.git/
gitea-repositories/jpm/jpmens-jo.git/
├── hooks
├── info
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

$ git clone /path/to/gitea-repositories/jpm/jpmens-jo.git/
Cloning into 'jpmens-jo'...
done.

I can configure Gitea’s cron schedule with an entry in app.ini:

[cron]
; Enable running cron tasks periodically.
ENABLED = true
; Run cron tasks when Gitea starts.
RUN_AT_START = true

; Update mirrors
[cron.update_mirrors]
SCHEDULE = @every 10m

[mirror]
; Default interval as a duration between each check
DEFAULT_INTERVAL = 8h
; Min interval as a duration must be > 1m
MIN_INTERVAL = 10m

The DEFAULT_INTERVAL is the default which is copied into the respository-specific mirror settings when creating the mirror. I can modify the interval in the UI, and MIN_INTERVAL is a setting which forbids users (i.e. myself) from entering shorter intervals:

broken image link was here repository-specific mirror settings

If I’m impatient or want to prod Gitea into mirroring a particular repository on demand, I can POST a request to its API:

curl -s -XPOST http://localhost:3000/api/v1/repos/jpm/jpmens-jo/mirror-sync \
     -H "accept: application/json" \
     -H "Authorization: token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

In order to monitor that mirroring is actually happening, I will periodically obtain the SHA of the last commit to the master branch on Github (that’s the best I can come up with in terms of “last updated” as there really isn’t a “last SHA” independent of a particular branch) and will see if I find that particular commit on Gitea’s side. If Gitea doesn’t carry it, I yell.

So, where importing is a one-time thing, mirroring causes Gitea to periodically check whether the source repo has changed, and if so, it pulls changes in. Mirroring doesn’t pull in issues or pull requests from Github, which is a bit of a shame, but I understand it’s not trivial to do. If you want a utility which does that, gitea-github-migrator is a one-shot program which does what it says on the tin. What Gitea does bring accross is a repository’s Wiki, and it does so by creating a *.wiki.git repository next to the actual repo, visible in the file system; within the UI it’s where you’d expect it to be and not separately listed.

If you want to set up your own self-hosted Gitea, it’s not difficult, and it doesn’t have to be public: mine is not Internet-accessible, but it has Internet access in order to be able to mirror repositories from GitHub.

I am not migrating away from GitHub because I see no reason to: the platform is very useful to me, and I’d not like to loose it. What I’m trying to accomplish is a fail-safe in case something happens to GitHub which would make me loose access, be that voluntarily or involuntarily.

Updates

On 2020-02-07 Stefan sends me an updated version of the mirror program (see below) and writes:

I made it so that you can specify a map that links remote repository names to local Gitea organizations so that one could group remotely mirrored repos into, well, Gitea organizations. If a repo isn’t found in the map it will be created in the account of the user specified in the script.

Bulk backup and clone git repositories from a list

Get the git repos list

Option A: Generate a list of repos already cloned on a machine

To do this we will use the linux command grep.

Please jump to How to use linux command under Windows? for more information.

grep -oh "url = .*" */.git/config | cut -d " " -f3 > git_repos.txt
  1. The command grep will search for the regular expresion "url = .*" inside the file config in any .git folder, one level down from the curren position.

In this hidden folder each git local repository stores the repository URL, inside the config file (no extension).

The arguments o will make grep to return only the matched (non-empty) parts of a matching line.
The argument h will suppress the prefixing of file names on output.

  1. Then it whill pass each result line to the command cut, that will split the string onthe character space and then return the third piece, the actual git repo URL.

  2. Finally it will append this line to the file git_repos.txt in the current folder (it will automatically create the file).

Option B: Download a list of all repos SSH urls from a project in Bitbucket.uhub.biz.

In this case we want to download the SSH url for all the existing repos inside a proyect in Bitbucket.

  1. Run this command to download a JSON file with all the repos information, including the SSH clone url.
    $ curl --request GET \
    --url 'http://bitbucket.org/rest/api/1.0/projects/[[PROJECT_NAME]]/repos/?limit=1000' \
    --header 'authorization: Basic [[AUTH_STRING]]' \
    --header 'content-type: application/json'
    
    
    • Replace [[PROJECT_NAME]] with the parent project name of the repos you want to download the info.
    • Replace [[AUTH_STRING]] with your user email and password, encoded as a base 64 string (ex.: user@mail.com:password)
  2. To extract the ssh urls filter the JSON file go to http://jsonpath.com/
    • In filed JSONPath Syntax copy and paste this string: $.values.*.links.clone[?(@.name=="ssh")].href
    • In field JSON copy and paste the JSON result downloaded on step 1.
  3. Create a new git_repos.txt file and copy the content of field Evaluation Results.
  4. Search and replace any []", character from the list. Also remove all leading spaces on the lines.
  5. All done. Your ready to start cloning.

Batch clone git repositories in the target machine

Now we need to copy the git_repos.txt file with the repositories list to the target machine, and batch clone them.

To do so, we will use this command:

$ grep . git_repos.txt | while read line ; do git clone "$line"; done

How this works?

  1. The command grep will read all the lines in the git_repos.txt file, and pass each line to the next command.
  2. The command while will read each line and execute as a command the string git clone followed by the current line (a repository URL).

Apendix

How to use linux command under Windows?

  • On Windows 10 you can use the "Windows Subsystem Linux".
  • Since you must have Git installed, another option is to use the Git Bash application.
  • And yet another option is to install GnuWin32 to add support for linux commands to Windows.
#!/usr/bin/env python
from github import Github # https://github.com/PyGithub/PyGithub
import requests
import json
import sys
import os
repo_map = { "some-github-repo": "a-gitea-org",
"another-github-repo": "another-gitea-org",
}
gitea_url = "http://127.0.0.1:3000/api/v1"
gitea_user = "a-gitea-user"
gitea_token = open(os.path.expanduser("~/.gitea-token")).read().strip()
session = requests.Session() # Gitea
session.headers.update({
"Content-type" : "application/json",
"Authorization" : "token {0}".format(gitea_token),
})
github_username = "jpmens"
github_token = open(os.path.expanduser("~/.github-token")).read().strip()
gh = Github(github_token)
for repo in gh.get_user().get_repos():
# Mirror to Gitea if I haven't forked this repository from elsewhere
if not repo.fork:
real_repo = repo.full_name.split('/')[1]
if real_repo in repo_map:
# We're creating the repo in another account (most likely an organization)
gitea_dest_user = repo_map[real_repo]
else:
gitea_dest_user = gitea_user
r = session.get("{0}/users/{1}".format(gitea_url, gitea_dest_user))
if r.status_code != 200:
print("Cannot get user id for '{0}'".format(gitea_dest_user), file=sys.stderr)
exit(1)
gitea_uid = json.loads(r.text)["id"]
m = {
"repo_name" : "{0}".format(real_repo),
"description" : repo.description or "not really known",
"clone_addr" : repo.clone_url,
"mirror" : True,
"private" : repo.private,
"uid" : gitea_uid,
}
if repo.private:
m["auth_username"] = github_username
m["auth_password"] = "{0}".format(github_token)
jsonstring = json.dumps(m)
r = session.post("{0}/repos/migrate".format(gitea_url), data=jsonstring)
if r.status_code != 201: # if not CREATED
if r.status_code == 409: # repository exists
continue
print(r.status_code, r.text, jsonstring)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment