Skip to content

Instantly share code, notes, and snippets.

@LukasKnuth
Created February 15, 2012 22:18
Show Gist options
  • Save LukasKnuth/1839424 to your computer and use it in GitHub Desktop.
Save LukasKnuth/1839424 to your computer and use it in GitHub Desktop.
This Python script can be used as a "pre-commit"-hook, to check if a huge binary file got accidentally added to the staging area (and is about to be committed). Because deleting those afterwards is a huge pain in the ass...
-- DESCRIPTION --
If you accidentally commit a huge file, you have a problem. Sure, you can remove it from the working tree and commit,
but the file is still reachable from your history and therefore causes every clone to be as huge as the commented
binary file.
Fixing this can be very ugly, time consuming and might not even work as you wish. Luckily, this script can protect
you from committing such monsters in the first place.
It looks through the staged files (the ones that are added with the "git add"-command) and checks for their file-size.
If they are larger then the given size, the commit is aborted and you get a message telling you what file takes so
much space so you can remove it from the staging area.
The script only checks the staged files, so having large files in your project-tree is not a problem, as long as
they don't get added to the staging-area.
-- USAGE --
--- LINUX ---
If you're on Linux (or any other *nix system, like FreeBSD), you'll want to change the SHEBANG [1] to point to
your python interpreter. To get the correct path, you can use the "which"-command:
which python
Also, you'll want to change the "git_binary_path"-variable to point to your git-executable. To find that path,
you can use "which" as well.
Last but not least, you'll want to specify the maximum size for a file in the "max_file_size"-variable. All
files which are added to the staging area and are larger then the specified value will cause the commit to be
aborted. The size is specified in KB, so if you want to allow files which are smaller then 1.2 MB you'll write:
max_file_size = 1228 # 1024 x 1.2
After personalizing the script, you'll want to tell git to use it. To do so, copy the script file to your
".git/hooks"-directory in the repo-root and rename it to "pre-commit" (no file ending). Then, you'll have to
mark it executable by using "chmod":
chmod +x pre-commit # in your "hooks"-directory
Now, if you run "git commit", the script will run and check for huge files in your staging-area.
--- WINDOWS ---
If you're working with Windows (and "msysgit"), it's a little more complicated. Since "msysgit" seams to have
a problem handling the SHEBANG [1], you'll have to use a little trick to make the script executable
(further information on this problem can be found here [2]).
In order to make the script work, you'll want to remove the SHEBANG from the Python script ("pre-commit.py")
and use a wrapper bash-script to call the interpreter. This script should look something like this:
#!/bin/sh
python .git/hooks/pre-commit.py
Store this script as a file called "pre-commit" (no file-ending). This assumes that you have Python in your
PATH [3]. If you don't, you can also specify the full path to your interpreter-executable.
This script will be called by "git commit" and call the python-script to check for the huge files. The path
after the SHEBANG should not be changed, as "msysgit" will remap it automatically. You must specify a path
relative to the repo-root for the Python script to be executed (because thats from where the script is called).
Afterwards you'll want to copy both the wrapper-file ("pre-commit") and the Python-script ("pre-commit.py") to
your repos ".git/hooks"-directory, personalize the Python-script ("max_file_size" and "git_binary_path") and
mark the "pre-commit"-file executable (see the Linux instructions).
--- MAC OS X ---
The instructions should be the same as for Linux.
-- LINKS --
[1] http://en.wikipedia.org/wiki/Shebang_Unix
[2] http://stackoverflow.com/questions/1547005
[3] http://en.wikipedia.org/wiki/Environment_variable#Examples_of_Unix_environment_variables
#!/usr/bin/python
"""
This is a git commit-hook which can be used to check if huge files
where accidentally added to the staging area and are about to be
committed.
If there is a file which is bigger then the given "max_file_size"-
variable, the script will exit non-zero and abort the commit.
This script is meant to be added as a "pre-commit"-hook. See this
page for further information:
http://progit.org/book/ch7-3.html#installing_a_hook
In order to make the script work probably, you'll need to set the
above path to the python interpreter (first line of the file)
according to your system (under *NIX do "which python" to find out).
Also, the "git_binary_path"-variable should contain the absolute
path to your "git"-executable (you can use "which" here, too).
See the included README-file for further information.
The script was developed and has been confirmed to work under
python 3.2.2 and git 1.7.7.1 (might also work with earlier versions!)
"""
# The maximum file-size for a file to be committed:
max_file_size = 512 # in KB (= 1024 byte)
# The path to the git-binary:
git_binary_path = "/usr/bin/git"
# ---- DON'T CHANGE THE REST UNLESS YOU KNOW WHAT YOU'RE DOING! ----
import subprocess, sys, os
"""
This function will return a human-readable filesize-string
like "3.5 MB" for it's given 'num'-parameter.
From http://stackoverflow.com/questions/1094841
"""
def sizeof_fmt(num):
for x in ['bytes','KB','MB','GB','TB']:
if num < 1024.0:
return "%3.1f %s" % (num, x)
num /= 1024.0
# Now, do the checking:
try:
print("Checking for files bigger then "+sizeof_fmt(max_file_size*1024))
# Check all files in the staging-area:
text = subprocess.check_output(
[git_binary_path, "status", "--porcelain", "-uno"],
stderr=subprocess.STDOUT).decode("utf-8")
file_list = text.splitlines()
# Check all files:
for file_s in file_list:
stat = os.stat(file_s[3:])
if stat.st_size > (max_file_size*1024):
# File is to big, abort the commit:
print("'"+file_s[3:]+"' is too huge to be commited!",
"("+sizeof_fmt(stat.st_size)+")")
sys.exit(1)
# Everything seams to be okay:
print("No huge files found.")
sys.exit(0)
except subprocess.CalledProcessError:
# There was a problem calling "git status".
print("Oops...")
sys.exit(12)
@AccuPhoenix01
Copy link

You also want to adjust this for when you remove a file from the repo, even a tiny file, you get an error.

        # Check all files:
	for file_s in file_list:
		filename = file_s[3:].strip('\"')
		# make sure this file exists, otherwise we'll get an error when removing files from git
		if os.path.isfile(filename):
                       stat = os.stat(filename)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment