Skip to content

Embed URL

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Transparent Git Encryption

Transparent Git Encryption

This document has been modified from its original format, which was written by Ning Shang (geek@cerias.net). It has been updated and reformatted into a Markdown document by Woody Gilk and republished.

Description

When working with a remote git repository which is hosted on a third-party storage server, data confidentiality sometimes becomes a concern. This article walks you through the procedures of setting up git repositories for which your local working directories are as normal (un-encrypted) but the committed content is encrypted.

The Story

I use git and Dropbox as a reliable, highly available, cost saving and distributed version control solution, and have really found it convenient and effective. One thing that is not addressed in this solution is data privacy/confidentiality. As Dropbox is a third-party data storage service with Amazon S3 as its backend data store, a paranoid user like myself would always be concerned about the Dropbox hosted data being disclosed to others, accidentally or deliberately. After all, putting unconditional trust on a third-party provider never seems to be a perfect rescue.

User controlled end-to-end encryption solves the problem: before data is pushed to the remote repository to store, it is encrypted with an encryption key which is known only to the data owner itself. Management of the encryption key(s) and the encryption/decryption processes is always tedious and easy to get wrong. In the following, we shall demonstrate how to use Git with encryption in a way transparent to the end user.

Before we start the demonstration, the following software packages need to be installed: git (version 1.7.1 for the demonstration), openssl 4. The operating system for the demonstration is Linux (Ubuntu 10.10).

The idea is to leverage git's smudge/clean filter, hinted by this discussion, in which GPG is proposed as the encryption method, we use OpenSSL's symmetric key cipher as it is a better suitable solution.

Setup

The procedures are as follows.

1) Before the git repository is created, in your home directory

$ mkdir .gitencrypt
$ cd !$

Create three files

$ touch clean_filter_openssl smudge_filter_openssl diff_filter_openssl 
$ chmod 755 *

These files will be the clean/smudge/diff handler/hook for the git repository which we are going to work with.

The first file is clean_filter_openssl:

#!/bin/bash

SALT_FIXED=<your-salt> # 24 or less hex characters
PASS_FIXED=<your-passphrase>

openssl enc -base64 -aes-256-ecb -S $SALT_FIXED -k $PASS_FIXED

Here replace <your-salt> with a random hexadecimal string and replace <your-passphrase> with a passphrase you will use as a mater secret for the symmetric key encryption/decryption. We are using AES-256 ECB mode as the encryption algorithm, as it turns out a deterministic encryption works best with git (we will explain later).

The next file is smudge_filter_openssl:

#!/bin/bash

# No salt is needed for decryption.
PASS_FIXED=<your-passphrase>

# If decryption fails, use `cat` instead. 
# Error messages are redirected to /dev/null.
openssl enc -d -base64 -aes-256-ecb -k $PASS_FIXED 2> /dev/null || cat

The last file is diff_filter_openssl:

#!/bin/bash

# No salt is needed for decryption.
PASS_FIXED=<your-passphrase>

# Error messages are redirect to /dev/null.
openssl enc -d -base64 -aes-256-ecb -k $PASS_FIXED -in "$1" 2> /dev/null || cat "$1"

Files in the .gitencrypt directory should be locally kept and never shared with anyone you do not want to have access to your data, as they contain your decryption passphrase.

2) Change to the project directory where the git repository is to be created. Suppose this directory is ~/myproj/.

$ git init

Now, create a .gitattributes file:

$ touch .gitattributes

Add the following content to .gitattributes:

* filter=openssl diff=openssl
[merge]
    renormalize = true

In this file, the filter and diff attributes are assigned to drivers named openssl, which should be defined in .git/config as follows.

[filter "openssl"]
    smudge = ~/.gitencrypt/smudge_filter_openssl
    clean = ~/.gitencrypt/clean_filter_openssl
[diff "openssl"]
    textconv = ~/.gitencrypt/diff_filter_openssl

3) Now git add relevant files to the staging area. When you do this, the clean filter is applied to files in your working directory, i.e., it encrypts the files before they are checked into the staging area. Note that as a best practice, .gitattributes should not be added. At this time, you can use git diff as usual, as the diff filter is properly configured to compare the difference of only plain text data (it first decrypts if needed).

4) Apply git commit to commit the changes to the repository.

5) Now suppose the repository myproj is connected to a remote repository named Dropbox at file://~/myproj-remote.git, and you have pushed all the committed changes to it. Suppose you want to create another git repository in directory ~/myproj-1. First clone the remote repository without checking out the HEAD.

$ git clone -o Dropbox -n file://myproj-remote.git myproj-1
$ cd myproj-1

Now create under myproj-1 a file .gitattributes with the same content as shown in Step 2. Then add/append the code snippet in .git/config in Step 2 to myproj-1/.git/config. Then reset the HEAD to check out all of the files.

$ git reset --hard HEAD

When the files are checked out, the smudge filter is automatically applied, decrypting the files in the repository and putting the decrypted files into your working directory. The reason non-deterministic encryption (what GPG does) does not work very well here is because the same file is transformed to a different ciphertext each time it is encrypted, doing a git status always shows the pulled files at modified, even though a git diff shows no difference. Checking in such modified files only replaces the old ciphertext with a new one which decrypts to the same file. If you work in two different local repositories synced to the same remote, the push/pull process will never end even if nothing is changed in your working directories. Using AES ECB mode with a fixed salt, although not semantically secure, resolves this problem while providing reasonable confidentiality.

From now on, you can work in the local repositories, push to or pull from the remote repository as usual, without noticing the encryption/decryption in the background.

I think this is cool.

@marksteve

Nice! Passwords with VC in Dropbox/github!

@staropram

I get the following error:

"openssl"] is not a valid attribute name: .gitattributes:6

~/.gitencrypt/smudge_filter_openssl is not a valid attribute name: .gitattributes:7

~/.gitencrypt/clean_filter_openssl is not a valid attribute name: .gitattributes:8

"openssl"] is not a valid attribute name: .gitattributes:10

~/.gitencrypt/diff_filter_openssl is not a valid attribute name: .gitattributes:11

Am I doing something obviously wrong?

@gonvaled

Thanks for this, it is working perfectly for me.

I have a question though: you mention

Using AES ECB mode with a fixed salt, although not semantically secure, resolves this problem while providing reasonable confidentiality.

I guess that means that using the same salt for encrypting different files, or different versions of the same file, makes it for an attacker easier to brute-force attack the password.

What kind of "reasonable confidentiality" can we expect? Do you have any pointers to studies about the security of such an implementation (fixed salt)? I have looked a bit on the net, to no avail, and I thought that maybe you have already done some research on the subject.

Thanks again for the very usefull howto.

@shadowhand
Owner

ECB mode encryption is a relatively simple method of encryption that provides a high level of obfuscation (or a low level of encryption). This method is not very secure and should not be used for sensitive personal data, but would work well for eg. transmitting source code between private parties in a public channel. For better security, you can switch the mode to CBC at the cost of having every file change completely for every modification. As with all encryption, a strong key is always recommended.

@shad0wfax

Thanks, this is a very nice setup and helps mitigate the risk putting our code on an untrusted VPS server :).
I also noticed that I can use "git reset --hard HEAD" instead of "git reset HEAD" in step 5. This decrypts all the files right away.

@shadowhand
Owner

@shad0wfax Are using saying that using git reset --hard HEAD makes it unnecessary to run the next command (git ls-files -d | xargs git checkout --)?

@shad0wfax

Yes. "git reset --hard" is good enough to ensure that the smudge filter is invoked and the files are decoded. There is no need to run (git ls-files -d | xargs git checkout --)

@shadowhand
Owner

Thanks @shad0wfax, I've updated my instructions and this gist.

@mgoellnitz

I played around with gitcrypt for a while now a seem to have noticed two things:

  1. It does not encrypt the filenames. I personally come to the conclusion that this is a real problem with names in source-codes and the like. OTOH I don't see any solution to that with the given - very nice and easy - concept of gitcrypt.
  2. It scrambles the git compression. Would it be an idea to at lest introduce a compressor to the pipeline?

And additionally: Do I really have to use the same salt on all clones? My fault or true?

@shadowhand
Owner

@mgoelinitz Good questions:

  1. Encrypting the filenames is near impossible, as git does not have an easy way to compensate for this.
  2. Git compression still works somewhat, but operates on blocks rather than individual lines. This does have an impact on the size of the repository, but so does encrypting. Again, git does not have an easy way to compensate for this.

Remember that gitcrypt is not perfect and is mostly just a clever hack.

@badger-d

Hi,

I've tried this and can get it to work with "git ls-files -d | xargs git checkout" but not "git reset --hard HEAD". My main question is after the initial clone, if I commit a change and push to the remote sever, I can't work out how to pull the non-encrypted data. The modified file that I pull is always encrypted, unlike the original clone. Is there a way to get at the changes without having to re-clone?

Regards

@shadowhand
Owner

@badger-d You have to create a blank repository, set up the encryption, set the remote, then do a pull. Alternatively, if you do a git clone ... directly, you would have to repeat the git reset --hard HEAD (or git ls-files equivalent) step.

@badger-d

Hi, thanks for the prompt response however I still cannot get this to work. I have listed my steps to ask if you can spot what I am doing wrong.

On the server where I backup to:

sudo git init --bare REPO.git

On local machine 1 where I have created the files:

cd ~
mkdir REPO
cd REPO
touch test.txt #(and add some text to it)
git init
touch .gitattributes
cat ~/Private/linux/ATTR >> .gitattributes #(where ATTR is a file with the "* filter ......" text)
cat ~/Private/linux/CONF >> .git/config #(where CONF is a file with the "[filter "openssl"] ..." text)
git add .
git commit
git remote add REPO /mnt/cifs/backup/REPO.git # where this is the mount point for the server
sudo git push REPO master

On local machine 2 where I first clone the files:

git clone -o REPO -n /mnt/cifs/backup/REPOj.git ~/REPO
cd ~/REPO
touch .gitattributes
cat "${HOME}/ATTR" >> .gitattributes
cat "${HOME}/CONF" >> .git/config
git reset HEAD
git ls-files -d | xargs git checkout

Everything is good at this point and the files are un-encrypted

Now go back to machine 1 and make a change to ~/REPO/test.txt

vim ~/REPO/test.txt # change something
git add test.txt 
git commit
sudo git push REPO master

Now go back to machine 2 and try and pull the change

cd ~/REPO
git reset --hard HEAD
git pull

Now the change file has been pulled however it is encrypted. I have tried many different ways of the writing the reset and pull lines above however the file is always encrypted.

@shadowhand
Owner

I can't reproduce your problem. The only difference in my setup is that I am using .git/info/attributes instead of .gitattributes, but I don't think that should matter.

@microcai

how to convert non-encrypted git repo to encrypted one?

@am-houtkooper

What happens if the salt and password are compromised? You would need to re-add all of your content.

@metamatt

If you are running into "is not a valid attribute name" errors, note that git apparently cares a lot about whitespace in its config files. Make sure that you've started the indented lines with 1 tab character instead of 4 space characters (if you copy and paste from this gist into a terminal window, you'll likely get 4 spaces). And the "renormalize = true" in .gitattributes needs the spaces around the equals sign removed, i.e. "renormalize=true".

@ataryx

Cool indeed, thanks a lot for sharing!

@salutis

This works beautifully. Except that in Xcode's Version Editor, the previous version (one on the right side of the window) is presented in its encrypted form. Asked about it on Stack OverFlow: http://stackoverflow.com/questions/14374558/git-openssl-filter-xcode

@ringods

Note that as a best practice, .gitattributes should not be added.

This is wrong according to the gittattributes man page:

If you wish to affect only a single repository (i.e., to assign attributes to files that are particular to one user’s workflow for that repository), then attributes should be placed in the $GIT_DIR/info/attributes file. Attributes which should be version-controlled and distributed to other repositories (i.e., attributes of interest to all users) should go into .gitattributes files. Attributes that should affect all repositories for a single user should be placed in a file specified by the core.attributesfile configuration option (see git-config(1)). Attributes for all users on a system should be placed in the $(prefix)/etc/gitattributes file.

The .gitattributes file is intended to be committed in the repository!

@bluss

Friends, you have been awarded a better solution! https://github.com/blake2-ppc/git-remote-gcrypt

@ashanbh

i made one tiny change. I basically put my username/password in my home directory under ~/.ssl/passphrase as so

[your password]
SALT=20130620

Then i parse it out in my script using

SALT_FIXED=`cat /Users/amit/.ssl/passphrase|grep "="|cut -d"=" -f 2`
openssl enc -base64 -aes-256-ecb -S $SALT_FIXED -pass file:/Users/amit/.ssl/passphrase
@astropaul

Has anyone got this working with Xcode5 ? It works on command line, but not when I add / commit changes from within Xcode

@JamesHarrison

"you can switch the mode to CBC at the cost of having every file change completely for every modification."

This is kind of (part of) the definition of functionally correct encryption - ECB (click here for an explanation) is a flawed legacy implementation recommended by precisely nobody for current use today, and only supported in OpenSSL because OpenSSL supports some very old and creaky legacy crypto implementations! It's only useful today as a learning tool and should never be used in current systems.

CBC of OFB modes should be the default - please consider changing your gist to use CBC and explain the potential benefits of ECB along with the downsides for those who would like to accept the loss in security for slight convenience in git. Nothing should be insecure by default!

@liyimeng

How this will work with code review?

@ravl1084

I implemented this and seems to be working well, but on one of my repositories, when trying to merge changes on a file, the file would show the <<<<, >>>> indicators that git puts in, but all the contents are encrypted so I can't resolve the merge conflicts. Is this a known issue? Which filter should I look at as a potential culprit?

@emanchen

Hi, I may found a problem, run this:
git show HEAD:1.txt
and it shows my secret key:
c:\temp\2>openssl enc -base64 -aes-256-ecb -S d0e24238 -k 4d89cf204a90c52084b553
U2FsdGVkX1/Q4kI4AAAAAH6omSZ6vsUVddQ306CILi3TlSl2U71kFltiWk/1g3WJ
WOQ6NHXsCUoLA4Vds/ReVCn93ps4Md2MwzAN210VPE9TYvb4ozjEq2uEndCtd3J8

The following is my test step:
1. create the 3 bat files: clean_filter_openssl.bat, diff_filter_openssl.bat, smudge_filter_openssl.bat as the instruction.
2. goto c:/temp/2 and do "git init".
3.create the .gitattributes in c:/temp/2.
4.modify c:/temp/2/.git/config as instruction.
5. run the following to add files to my repo:
git add *
git commit -a -m ver1
6.(suppose my repo in public cloud driver is obtained by someone) copy c:/temp/2/.git/ to the empty directory c:/temp/3 and go there, where there is no .gitattributes file.
7. test to see it is encrypted(out put encrypted file content)
git checkout *
cat 1.txt
8. copy .gitattribytes to test to c:/temp/3 and test to see it is decryptable(out put original file content)
git checkout *
cat 1.txt
9. up to now, everything seems ok,and then i try git show HEAD:1.txt, and it shows my secret key!

any one know how to get rid of this?

@wendymungovan

Hi, I'm sorry if this is in the wrong place. I think git-encrypt is really cool and would like to try it out. I've setup a test repository following the direction on https://github.com/shadowhand/git-encrypt. I've got everything setup the config and attributes files look like what you describe but the files are still not being encrypted. Is there something I can look at to see what I did wrong?

Thanks.

@edburnett

chmod 755 makes the salt and passphrase contained in the *_filter_openssl files readable by everyone, which is likely undesirable. I'd instead recommend making these read/write/executable only by the current user:

$ chmod -R 700 ~/.gitencrypt
@DanielStevenLewis

"ECB mode encryption is a relatively simple method of encryption that provides a high level of obfuscation (or a low level of encryption). This method is not very secure and should not be used for sensitive personal data, but would work well for eg. transmitting source code between private parties in a public channel. For better security, you can switch the mode to CBC at the cost of having every file change completely for every modification. As with all encryption, a strong key is always recommended."

...

"This is kind of (part of) the definition of functionally correct encryption - ECB (click here for an explanation) is a flawed legacy implementation recommended by precisely nobody for current use today, and only supported in OpenSSL because OpenSSL supports some very old and creaky legacy crypto implementations! It's only useful today as a learning tool and should never be used in current systems.

CBC of OFB modes should be the default - please consider changing your gist to use CBC and explain the potential benefits of ECB along with the downsides for those who would like to accept the loss in security for slight convenience in git. Nothing should be insecure by default!"


http://git.661346.n2.nabble.com/Transparently-encrypt-repository-contents-with-GPG-td2470145.html states that using a fixed-valued salt for CBC is bad crypto practice. If we switched the mode to CBC, would it be using a fixed-value salt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.