Skip to content

Instantly share code, notes, and snippets.

@guysmoilov
Last active February 14, 2024 23:46
  • Star 20 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save guysmoilov/ddb3329e31b001c1e990e08394a08dc4 to your computer and use it in GitHub Desktop.
Git pre-commit hook for large files

Git pre-commit hook for large files

This hook warns you before you accidentally commit large files to git. It's very hard to reverse such an accidental commit, so it's better to prevent it in advance.

Since you will likely want this script to run in all your git repos, a script is attached to add this hook to all git repos you create / clone in the future.

Of course, you can just download it directly to the hooks in an existing git repo.

If you find this script useful, you might enjoy our more heavy-duty project FastDS, which aims to make it easier to work with versioning in data science projects.

Installation

curl -L https://gist.github.com/guysmoilov/ddb3329e31b001c1e990e08394a08dc4/raw/install.sh | bash

Configuration

The default limit is max 5MB per file. If you feel that your commit is a special case, you can always override the limit with:

GIT_FILE_SIZE_LIMIT=42000000 git commit -m "This commit is allowed file sizes up to 42MB"

Contents

Credits

Adapted from: https://gist.github.com/benmccallum/28e4f216d9d72f5965133e6c43aaff6e

Help from this stackoverflow question

#!/bin/sh
set -e
echo "Starting install script..."
SET_GIT_TEMPLATE_DIR=false
EXISTING_TEMPLATE=$(git config --global init.templateDir || echo "")
if [ -z "$EXISTING_TEMPLATE" ]; then
echo "Creating a new global git template dir at ~/.git_template"
mkdir ~/.git_template
EXISTING_TEMPLATE="$(cd ~; pwd -P)/.git_template"
SET_GIT_TEMPLATE_DIR=true
else
EXISTING_TEMPLATE="$(eval cd $(dirname "$EXISTING_TEMPLATE"); pwd -P)/$(basename "$EXISTING_TEMPLATE")"
echo "Using existing git template dir: $EXISTING_TEMPLATE"
fi
HOOKS_DIR="$EXISTING_TEMPLATE/hooks"
PRECOMMIT_HOOK="$HOOKS_DIR/pre-commit"
echo "Creating hooks dir if it doesn't already exist: $HOOKS_DIR"
mkdir -p "$HOOKS_DIR"
if [ -f "$PRECOMMIT_HOOK" ]; then
echo "Cannot install hook as it's already defined: '$PRECOMMIT_HOOK'" >&2
exit 1
fi
echo "Downloading the hook to $PRECOMMIT_HOOK"
curl -L https://gist.github.com/guysmoilov/ddb3329e31b001c1e990e08394a08dc4/raw/pre-commit -o "$PRECOMMIT_HOOK" 2> /dev/null
echo "Making it executable"
chmod +x "$PRECOMMIT_HOOK"
if [ "$SET_GIT_TEMPLATE_DIR" = true ]; then
echo "Defining ~/.git_template as the global git template dir"
git config --global init.templateDir '~/.git_template'
fi
echo -e "\nDone! Any future git repo created in this user profile will contain the hook\n"
#!/bin/sh
# This is a pre-commit hook that ensures attempts to commit files that are
# are larger than $limit to your _local_ repo fail, with a helpful error message.
# You can override the default limit of 5MB by supplying the environment variable:
# GIT_FILE_SIZE_LIMIT=42000000 git commit -m "This commit is allowed file sizes up to 42MB"
# Maximum file size limit in bytes
limit=${GIT_FILE_SIZE_LIMIT:-5000000} # Default 5MB
limitInMB=$(( $limit / 1000000 ))
# Move to the repo root so git files paths make sense
repo_root=$( git rev-parse --show-toplevel )
cd $repo_root
empty_tree=$( git hash-object -t tree /dev/null )
if git rev-parse --verify HEAD > /dev/null 2>&1
then
against=HEAD
else
against="$empty_tree"
fi
# Set split so that for loop below can handle spaces in file names by splitting on line breaks
IFS='
'
echo "Checking staged file sizes"
shouldFail=false
for file in $( git diff-index --cached --name-only $against ); do
file_size=$(([ ! -f $file ] && echo 0) || (ls -la $file | awk '{ print $5 }'))
if [ "$file_size" -gt "$limit" ]; then
echo File $file is $(( $file_size / 10**6 )) MB, which is larger than our configured limit of $limitInMB MB
shouldFail=true
fi
done
if $shouldFail
then
echo If you really need to commit this file, you can override the size limit by setting the GIT_FILE_SIZE_LIMIT environment variable, e.g. GIT_FILE_SIZE_LIMIT=42000000 for 42MB. Or, commit with the --no-verify switch to skip the check entirely, you naughty boy!
echo Commit aborted
exit 1;
fi
@verhovsky
Copy link

Or use http://pre-commit.com/ and then you just need to add

-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: <release number here>
    hooks:
    -   id: check-added-large-files

@benmccallum
Copy link

@verhovsky, fair call. I did investigate pre-commit when I was looking into all this but in my case, we were already using Husky, which IMO is an easier dependency to manage w/ source control than Python, which pre-commit requires. Good share though as some folks will go that way too, esp. those dev'ing in Python.

@verhovsky
Copy link

verhovsky commented Jan 6, 2020

@benmccallum pre-commit at least lets you specify the version of Python it should use:

default_language_version:
    python: python3.8

You're still responsible for making sure that that version is installed though.

@myTselection
Copy link

i'm getting below error on commit:
.git/hooks/pre-commit: 11: arithmetic expression: expecting primary: " 95000000 / 10**6 "
(I needed a max of 95mb so I only changed that part).
when replacing 10**6 by 10*6 it seems to work, but I'm not fully sure if it had a special meening to have ** as it appears twice like this in your script.

@guysmoilov
Copy link
Author

@myTselection indeed, 10*6 == 60 and 10**6 == 1000000 i.e. 1MB = 1 million bytes.
You can just replace 10**6 with 1000000 in your script.
Which shell are you using BTW?

@myTselection
Copy link

thx for quick feedback. Makes sense. Using Ubuntu 20.04.1 LTS with GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)

@danmackinlay
Copy link

Hmm - if you commit a file deletion this currently throws an error like

ls: environment.yaml: No such file or directory
.git/hooks/pre-commit: line 34: [: : integer expression expected

Any shell wizardry ideas to prevent this?

@guysmoilov
Copy link
Author

@danmackinlay you're right!
A quick fix is to change the line to check whether the file exists:

file_size=$(([ ! -f $file ] && echo 0) || (ls -la $file | awk '{ print $5 }'))

I'll fix the gist, thanks for reporting this!

@moschlegel
Copy link

@danmackinlay you're right!
A quick fix is to change the line to check whether the file exists:

file_size=$(([ ! -f $file ] && echo 0) || (ls -la $file | awk '{ print $5 }'))

I'll fix the gist, thanks for reporting this!

I'd suggest getting the file size directly with 'stat', that would avoid the awk part:

file_size=$(([ ! -f $file ] && echo 0) || (stat -c %s $file ))

@guysmoilov
Copy link
Author

@danmackinlay you're right!
A quick fix is to change the line to check whether the file exists:

file_size=$(([ ! -f $file ] && echo 0) || (ls -la $file | awk '{ print $5 }'))

I'll fix the gist, thanks for reporting this!

I'd suggest getting the file size directly with 'stat', that would avoid the awk part:

file_size=$(([ ! -f $file ] && echo 0) || (stat -c %s $file ))

Thanks for the suggestion! It does seem more straightforward.
However, it looks like the behavior of stat is inconsistent in Mac vs. Linux
On my mac it should be stat -f %z $file
So I think to avoid complicating things I'll leave it as is.

@expelledboy
Copy link

In 2022 anyone that is interested in a git native solution. I dont care about any particular file, just limit the entire commit itself, which should atleast make a developer thing twice before they perhaps make the decision to git commit --no-verify

@aloisklink
Copy link

i'm getting below error on commit:
.git/hooks/pre-commit: 11: arithmetic expression: expecting primary: " 95000000 / 10**6 "

Which shell are you using BTW?

I had the same issue on Ubuntu 22.04 Linux. For most Linux distros, /bin/sh goes to ash/dash, which explicitly only supports POSIX shell features, so exponentials don't work. Replacing the 10**6 with 1000000 works fine.

https://www.shellcheck.net/ also prints the same warning: (warning): In POSIX sh, exponentials are undefined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment