Skip to content

Instantly share code, notes, and snippets.

@fjahr
Last active March 7, 2024 14:48
Show Gist options
  • Save fjahr/9a28abefe0ab8413d96aa1dd7903c5d4 to your computer and use it in GitHub Desktop.
Save fjahr/9a28abefe0ab8413d96aa1dd7903c5d4 to your computer and use it in GitHub Desktop.
Self-hosting Bitcoin Core on GitLab

Bitcoin Core backups on self-hosted GitLab

Introduction

This document shows how a self-hosted GitLab server can be used as a backup for the Bitcoin Core repo in GitHub.com. The backup can be interacted with after it is finished. A self-hosted GitLab instance could be used for further development if there would be issues with the repo on GitHub.com.

Running a backup server

System Requirements

Gitlab lists their minimum recommended requirements for a self-hosted instance here. The tldr for up to 500 users: 4 cores and 4 GB RAM. For storage we recommend 20GB minimum, Gitlab alone recommends 10GB and system and repo requirements are added on top of that.

Aside from the the requirements for collaboration, it also helps with the import speed if the machine is a bit beefier. On a machine with double the specs described above we have seen import times of ~36 hours.

Gitlab supports a list of operating systems, Ubuntu and CentOS seem to be the preferred choices.

System settings

Aside from the general configuration of the GitLab instance the following needs to be ensured on the GitLab server to minimize the chance of failures during the import process:

  1. Make sure that the github_import_extended_events is disabled on your instance globally and, if not, set it to disabled.
irb(main):001:0> Feature.enabled?(:github_import_extended_events)
=> false
  1. It is recommended that the import runs on the weekend because the import runs a very long time, depending on your hardware probably 24-48 hours, and the import might fail if something is deleted during the run of the import. The reason for this is that the importer might encounter a dead link and this will cause a failure. It seems that mostly pulls/issue that are obvious spam can be removed by GitHub unprompted, so our best bet is to do this on weekends when this is less likely to happen. If your import fails and you see something like 404 - Not Found or a similar error code in the import history, this is probably what happened and, unfortunately, you will have to delete everything and start over. There is no way to continue the import from where it stopped.

  2. Increase the number of sidekiq workers for the importer. The default is just one sidekiq worker and this slows down the import significantly. Four workers are recommended, if you can add more depends on your hardware. The current setting we are using is two dedicated workers and two general workers:

sidekiq['queue_groups'] = ['github_importer', 'github_importer_advance_stage', '*', '*']
  1. Configure reduced GitHub API objects requested per page.

  2. Ensure that the Maximum import size is over 2 GB (Admin panel -> Import and export settings).

  3. If you want to periodically remove the imported project and then reimport it will have to set the deletion period for projects to 0 so they are actually deleted immediately. Admin Area -> Settings -> General -> Visibility and access controls -> Deletion protection.

Running the import

The import can not be triggered via the UI because it only allows to trigger imports of repositories that you own but not public repositories. You need to use the REST API instead:

curl --request POST \
--url "https://gitlab.sighash.org/api/v4/import/github" \
--header "content-type: application/json" \
--header "PRIVATE-TOKEN: <gl-access-token>" \
--data '{
    "personal_access_token": "<gh-access-token>",
    "repo_id": "1181927",
    "target_namespace": "bitcoin",
    "optional_stages": {
      "single_endpoint_issue_events_import": true,
      "single_endpoint_notes_import": true,
      "attachments_import": false,
      "collaborators_import": false}}'

See also the documentation here but particularly the optional_stages need to be set exactly in the way above to prevent a failure of the import (see some additional documentation here, "additional things to import" refers to the same options). Missing the collaborators import is unfortunate, however this GitLab functionality is build with companies in mind that actually have full control over their contributors accounts (see also the Limitations section for further info on this). The gist attachments feature doesn't seem to be used much, if at, from what I can tell. If I am mistaken here, please let me know so we can try to figure out a solution for this.

Note that, if you had the project imported before, you first need to delete it. That can be done via UI or API:

curl --request DELETE --header "PRIVATE-TOKEN: <your_access_token>" \
     "https://gitlab.example.com/api/v4/projects/<your-project-ID>"

The ID of the project will be returned from the import call but you can also use the project path instead with should probably remain consistent.

Note that during the import the project is not usable on GitLab at all! Users can not look at it, interact with it or run an export of the data.

While the import is running you can see it on the import history page of you instance but unfortunately there seems to be no way to get an indication what the progress is and how much longer the process will take.

Continuous import

The GitLab API also provides endpoints for checking an import status and exporting projects. So it is possible to run a script that triggers the import from github continuously and downloads each new successful export before wiping the server and starting the import again. This should allow for having a fresh export backup every 2 days if the import is successful each time. But keep in mind that this also means that the project on the GitLab server is completely unusable since it can not be used while the import is running.

A draft of a script for this can be found here but it is untested/WIP.

Limitations of the data transferred

GitLab is able to create user accounts in the backup server based on the users active on GitHub and then link all their activity correctly to that account. This even allows the former GitHub user to join the GitLab server later and inherit the account with it's activity and continue working with it as before. However, GitLab is only doing this if the GitHub user has made their email public. If the email of the account on GitHub is private (as is the case for most Bitcoin Core contributors) then this will not work. The user account will not created and the activity will not be linked to it. This can also be done retroactively. Instead the contributions will be assigned to the administrator account that triggered the import and it will have a note at the top which indicate which user has made this contribution originally (see example below). This means for those users that don't have their email set to public, the switch would not be as frictionless as for the users that do have it. However, this seems manageable.

Screenshot 2024-02-28 at 10 50 56 PM

Mirroring feature

Originally an idea of this experiment was to leverage the GitLab mirroring feature to have real-time, or close to real-time, updates in a leader-follower setup. This is not supported from GitHub.com to GitLab currently though, only the other way around and between GitLab instances. It seems doable to build this but we would need to maintain both the code and infrastructure for it.

However, would we switch the project to GitLab, we could use this to have self-hosted follower instances that are up-to-date with the main site.

Brink backup server

Brink hosts a server that is reachable via https://gitlab.sighash.org which will regularly backup the GitHub repo on GitLab. The latest bitcoin core project backup can be seen here (status 2024-02-26 at the time of this writing). Please provide feedback on the quality of the data preserved since not everything is working as originally hoped (see limitations).

The server will run the backups as an import roughly once per week. This seems to be the only workable solution for now as will come clear in sections further below.

Exporting the backups

The server can allow users to join and create an export of these weekly backups so that anyone can store these backups locally and launch an instance from it when necessary. This would be a more light weight way to participate in this effort than running a server with which does regular backups as well, though of course the more people do this the better, so the following information may help hosting your own instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment