public
Last active

Transparent Git Encryption

  • Download Gist
encrypted-git-repo.md
Markdown

Transparent Git Encryption

This document has been modified from its original format, which was written by Ning Shang (geek@cerias.net). It has been updated and reformatted into a Markdown document by Woody Gilk and republished.

Description

When working with a remote git repository which is hosted on a third-party storage server, data confidentiality sometimes becomes a concern. This article walks you through the procedures of setting up git repositories for which your local working directories are as normal (un-encrypted) but the committed content is encrypted.

The Story

I use git and Dropbox as a reliable, highly available, cost saving and distributed version control solution, and have really found it convenient and effective. One thing that is not addressed in this solution is data privacy/confidentiality. As Dropbox is a third-party data storage service with Amazon S3 as its backend data store, a paranoid user like myself would always be concerned about the Dropbox hosted data being disclosed to others, accidentally or deliberately. After all, putting unconditional trust on a third-party provider never seems to be a perfect rescue.

User controlled end-to-end encryption solves the problem: before data is pushed to the remote repository to store, it is encrypted with an encryption key which is known only to the data owner itself. Management of the encryption key(s) and the encryption/decryption processes is always tedious and easy to get wrong. In the following, we shall demonstrate how to use Git with encryption in a way transparent to the end user.

Before we start the demonstration, the following software packages need to be installed: git (version 1.7.1 for the demonstration), openssl 4. The operating system for the demonstration is Linux (Ubuntu 10.10).

The idea is to leverage git's smudge/clean filter, hinted by this discussion, in which GPG is proposed as the encryption method, we use OpenSSL's symmetric key cipher as it is a better suitable solution.

Setup

The procedures are as follows.

1) Before the git repository is created, in your home directory

$ mkdir .gitencrypt
$ cd !$

Create three files

$ touch clean_filter_openssl smudge_filter_openssl diff_filter_openssl 
$ chmod 755 *

These files will be the clean/smudge/diff handler/hook for the git repository which we are going to work with.

The first file is clean_filter_openssl:

#!/bin/bash

SALT_FIXED=<your-salt> # 24 or less hex characters
PASS_FIXED=<your-passphrase>

openssl enc -base64 -aes-256-ecb -S $SALT_FIXED -k $PASS_FIXED

Here replace <your-salt> with a random hexadecimal string and replace <your-passphrase> with a passphrase you will use as a mater secret for the symmetric key encryption/decryption. We are using AES-256 ECB mode as the encryption algorithm, as it turns out a deterministic encryption works best with git (we will explain later).

The next file is smudge_filter_openssl:

#!/bin/bash

# No salt is needed for decryption.
PASS_FIXED=<your-passphrase>

# If decryption fails, use `cat` instead. 
# Error messages are redirected to /dev/null.
openssl enc -d -base64 -aes-256-ecb -k $PASS_FIXED 2> /dev/null || cat

The last file is diff_filter_openssl:

#!/bin/bash

# No salt is needed for decryption.
PASS_FIXED=<your-passphrase>

# Error messages are redirect to /dev/null.
openssl enc -d -base64 -aes-256-ecb -k $PASS_FIXED -in "$1" 2> /dev/null || cat "$1"

Files in the .gitencrypt directory should be locally kept and never shared with anyone you do not want to have access to your data, as they contain your decryption passphrase.

2) Change to the project directory where the git repository is to be created. Suppose this directory is ~/myproj/.

$ git init

Now, create a .gitattributes file:

$ touch .gitattributes

Add the following content to .gitattributes:

* filter=openssl diff=openssl
[merge]
    renormalize = true

In this file, the filter and diff attributes are assigned to drivers named openssl, which should be defined in .git/config as follows.

[filter "openssl"]
    smudge = ~/.gitencrypt/smudge_filter_openssl
    clean = ~/.gitencrypt/clean_filter_openssl
[diff "openssl"]
    textconv = ~/.gitencrypt/diff_filter_openssl

3) Now git add relevant files to the staging area. When you do this, the clean filter is applied to files in your working directory, i.e., it encrypts the files before they are checked into the staging area. Note that as a best practice, .gitattributes should not be added. At this time, you can use git diff as usual, as the diff filter is properly configured to compare the difference of only plain text data (it first decrypts if needed).

4) Apply git commit to commit the changes to the repository.

5) Now suppose the repository myproj is connected to a remote repository named Dropbox at file://~/myproj-remote.git, and you have pushed all the committed changes to it. Suppose you want to create another git repository in directory ~/myproj-1. First clone the remote repository without checking out the HEAD.

$ git clone -o Dropbox -n file://myproj-remote.git myproj-1
$ cd myproj-1

Now create under myproj-1 a file .gitattributes with the same content as shown in Step 2. Then add/append the code snippet in .git/config in Step 2 to myproj-1/.git/config. Then reset the HEAD to check out all of the files.

$ git reset --hard HEAD

When the files are checked out, the smudge filter is automatically applied, decrypting the files in the repository and putting the decrypted files into your working directory. The reason non-deterministic encryption (what GPG does) does not work very well here is because the same file is transformed to a different ciphertext each time it is encrypted, doing a git status always shows the pulled files at modified, even though a git diff shows no difference. Checking in such modified files only replaces the old ciphertext with a new one which decrypts to the same file. If you work in two different local repositories synced to the same remote, the push/pull process will never end even if nothing is changed in your working directories. Using AES ECB mode with a fixed salt, although not semantically secure, resolves this problem while providing reasonable confidentiality.

From now on, you can work in the local repositories, push to or pull from the remote repository as usual, without noticing the encryption/decryption in the background.

I think this is cool.

Nice! Passwords with VC in Dropbox/github!

I get the following error:

"openssl"] is not a valid attribute name: .gitattributes:6

~/.gitencrypt/smudge_filter_openssl is not a valid attribute name: .gitattributes:7

~/.gitencrypt/clean_filter_openssl is not a valid attribute name: .gitattributes:8

"openssl"] is not a valid attribute name: .gitattributes:10

~/.gitencrypt/diff_filter_openssl is not a valid attribute name: .gitattributes:11

Am I doing something obviously wrong?

Thanks for this, it is working perfectly for me.

I have a question though: you mention

Using AES ECB mode with a fixed salt, although not semantically secure, resolves this problem while providing reasonable confidentiality.

I guess that means that using the same salt for encrypting different files, or different versions of the same file, makes it for an attacker easier to brute-force attack the password.

What kind of "reasonable confidentiality" can we expect? Do you have any pointers to studies about the security of such an implementation (fixed salt)? I have looked a bit on the net, to no avail, and I thought that maybe you have already done some research on the subject.

Thanks again for the very usefull howto.

ECB mode encryption is a relatively simple method of encryption that provides a high level of obfuscation (or a low level of encryption). This method is not very secure and should not be used for sensitive personal data, but would work well for eg. transmitting source code between private parties in a public channel. For better security, you can switch the mode to CBC at the cost of having every file change completely for every modification. As with all encryption, a strong key is always recommended.

Thanks, this is a very nice setup and helps mitigate the risk putting our code on an untrusted VPS server :).
I also noticed that I can use "git reset --hard HEAD" instead of "git reset HEAD" in step 5. This decrypts all the files right away.

@shad0wfax Are using saying that using git reset --hard HEAD makes it unnecessary to run the next command (git ls-files -d | xargs git checkout --)?

Yes. "git reset --hard" is good enough to ensure that the smudge filter is invoked and the files are decoded. There is no need to run (git ls-files -d | xargs git checkout --)

Thanks @shad0wfax, I've updated my instructions and this gist.

I played around with gitcrypt for a while now a seem to have noticed two things:

  1. It does not encrypt the filenames. I personally come to the conclusion that this is a real problem with names in source-codes and the like. OTOH I don't see any solution to that with the given - very nice and easy - concept of gitcrypt.
  2. It scrambles the git compression. Would it be an idea to at lest introduce a compressor to the pipeline?

And additionally: Do I really have to use the same salt on all clones? My fault or true?

@mgoelinitz Good questions:

  1. Encrypting the filenames is near impossible, as git does not have an easy way to compensate for this.
  2. Git compression still works somewhat, but operates on blocks rather than individual lines. This does have an impact on the size of the repository, but so does encrypting. Again, git does not have an easy way to compensate for this.

Remember that gitcrypt is not perfect and is mostly just a clever hack.

Hi,

I've tried this and can get it to work with "git ls-files -d | xargs git checkout" but not "git reset --hard HEAD". My main question is after the initial clone, if I commit a change and push to the remote sever, I can't work out how to pull the non-encrypted data. The modified file that I pull is always encrypted, unlike the original clone. Is there a way to get at the changes without having to re-clone?

Regards

@badger-d You have to create a blank repository, set up the encryption, set the remote, then do a pull. Alternatively, if you do a git clone ... directly, you would have to repeat the git reset --hard HEAD (or git ls-files equivalent) step.

Hi, thanks for the prompt response however I still cannot get this to work. I have listed my steps to ask if you can spot what I am doing wrong.

On the server where I backup to:

sudo git init --bare REPO.git

On local machine 1 where I have created the files:

cd ~
mkdir REPO
cd REPO
touch test.txt #(and add some text to it)
git init
touch .gitattributes
cat ~/Private/linux/ATTR >> .gitattributes #(where ATTR is a file with the "* filter ......" text)
cat ~/Private/linux/CONF >> .git/config #(where CONF is a file with the "[filter "openssl"] ..." text)
git add .
git commit
git remote add REPO /mnt/cifs/backup/REPO.git # where this is the mount point for the server
sudo git push REPO master

On local machine 2 where I first clone the files:

git clone -o REPO -n /mnt/cifs/backup/REPOj.git ~/REPO
cd ~/REPO
touch .gitattributes
cat "${HOME}/ATTR" >> .gitattributes
cat "${HOME}/CONF" >> .git/config
git reset HEAD
git ls-files -d | xargs git checkout

Everything is good at this point and the files are un-encrypted

Now go back to machine 1 and make a change to ~/REPO/test.txt

vim ~/REPO/test.txt # change something
git add test.txt 
git commit
sudo git push REPO master

Now go back to machine 2 and try and pull the change

cd ~/REPO
git reset --hard HEAD
git pull

Now the change file has been pulled however it is encrypted. I have tried many different ways of the writing the reset and pull lines above however the file is always encrypted.

I can't reproduce your problem. The only difference in my setup is that I am using .git/info/attributes instead of .gitattributes, but I don't think that should matter.

how to convert non-encrypted git repo to encrypted one?

What happens if the salt and password are compromised? You would need to re-add all of your content.

If you are running into "is not a valid attribute name" errors, note that git apparently cares a lot about whitespace in its config files. Make sure that you've started the indented lines with 1 tab character instead of 4 space characters (if you copy and paste from this gist into a terminal window, you'll likely get 4 spaces). And the "renormalize = true" in .gitattributes needs the spaces around the equals sign removed, i.e. "renormalize=true".

Cool indeed, thanks a lot for sharing!

This works beautifully. Except that in Xcode's Version Editor, the previous version (one on the right side of the window) is presented in its encrypted form. Asked about it on Stack OverFlow: http://stackoverflow.com/questions/14374558/git-openssl-filter-xcode

Note that as a best practice, .gitattributes should not be added.

This is wrong according to the gittattributes man page:

If you wish to affect only a single repository (i.e., to assign attributes to files that are particular to one user’s workflow for that repository), then attributes should be placed in the $GIT_DIR/info/attributes file. Attributes which should be version-controlled and distributed to other repositories (i.e., attributes of interest to all users) should go into .gitattributes files. Attributes that should affect all repositories for a single user should be placed in a file specified by the core.attributesfile configuration option (see git-config(1)). Attributes for all users on a system should be placed in the $(prefix)/etc/gitattributes file.

The .gitattributes file is intended to be committed in the repository!

Friends, you have been awarded a better solution! https://github.com/blake2-ppc/git-remote-gcrypt

i made one tiny change. I basically put my username/password in my home directory under ~/.ssl/passphrase as so

[your password]
SALT=20130620

Then i parse it out in my script using

SALT_FIXED=`cat /Users/amit/.ssl/passphrase|grep "="|cut -d"=" -f 2`
openssl enc -base64 -aes-256-ecb -S $SALT_FIXED -pass file:/Users/amit/.ssl/passphrase

Has anyone got this working with Xcode5 ? It works on command line, but not when I add / commit changes from within Xcode

"you can switch the mode to CBC at the cost of having every file change completely for every modification."

This is kind of (part of) the definition of functionally correct encryption - ECB (click here for an explanation) is a flawed legacy implementation recommended by precisely nobody for current use today, and only supported in OpenSSL because OpenSSL supports some very old and creaky legacy crypto implementations! It's only useful today as a learning tool and should never be used in current systems.

CBC of OFB modes should be the default - please consider changing your gist to use CBC and explain the potential benefits of ECB along with the downsides for those who would like to accept the loss in security for slight convenience in git. Nothing should be insecure by default!

How this will work with code review?

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.