Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@minrk
Last active June 6, 2023 06:23
Show Gist options
  • Save minrk/6176788 to your computer and use it in GitHub Desktop.
Save minrk/6176788 to your computer and use it in GitHub Desktop.
git pre-commit hook for stripping output from IPython notebooks
#!/usr/bin/env python
"""strip outputs from an IPython Notebook
Opens a notebook, strips its output, and writes the outputless version to the original file.
Useful mainly as a git filter or pre-commit hook for users who don't want to track output in VCS.
This does mostly the same thing as the `Clear All Output` command in the notebook UI.
LICENSE: Public Domain
"""
import io
import sys
try:
# Jupyter >= 4
from nbformat import read, write, NO_CONVERT
except ImportError:
# IPython 3
try:
from IPython.nbformat import read, write, NO_CONVERT
except ImportError:
# IPython < 3
from IPython.nbformat import current
def read(f, as_version):
return current.read(f, 'json')
def write(nb, f):
return current.write(nb, f, 'json')
def _cells(nb):
"""Yield all cells in an nbformat-insensitive manner"""
if nb.nbformat < 4:
for ws in nb.worksheets:
for cell in ws.cells:
yield cell
else:
for cell in nb.cells:
yield cell
def strip_output(nb):
"""strip the outputs from a notebook object"""
nb.metadata.pop('signature', None)
for cell in _cells(nb):
if 'outputs' in cell:
cell['outputs'] = []
if 'prompt_number' in cell:
cell['prompt_number'] = None
return nb
if __name__ == '__main__':
filename = sys.argv[1]
with io.open(filename, 'r', encoding='utf8') as f:
nb = read(f, as_version=NO_CONVERT)
nb = strip_output(nb)
with io.open(filename, 'w', encoding='utf8') as f:
write(nb, f)
#!/bin/sh
#
# strip output of IPython Notebooks
# add this as `.git/hooks/pre-commit`
# to run every time you commit a notebook
#
# requires `nbstripout` to be available on your PATH
#
# LICENSE: Public Domain
if git rev-parse --verify HEAD >/dev/null 2>&1; then
against=HEAD
else
# Initial commit: diff against an empty tree object
against=4b825dc642cb6eb9a060e54bf8d69288fbee4904
fi
# Find notebooks to be committed
(
IFS='
'
NBS=`git diff-index -z --cached $against --name-only | grep '.ipynb$' | uniq`
for NB in $NBS ; do
echo "Removing outputs from $NB"
nbstripout "$NB"
git add "$NB"
done
)
exec git diff-index --check --cached $against --
@dietmarw
Copy link

I've created a version that removes the whole cell. Although I have to admit the way I track the index is not at all optimal and there might be better ways making proper use of the API. Feedback welcome:
https://gist.github.com/dietmarw/dc0cf089d8d6211136d5

@kynan
Copy link

kynan commented Sep 12, 2015

I have added documentation, an nbstripout install command to install the filter in the current Git repository and turned it into a module with a setuptools script entry point: https://github.com/kynan/nbstripout

How do you feel about publishing that on PyPI @minrk?

@jond3k
Copy link

jond3k commented Sep 14, 2015

I've adapted cfriedline's repo to make it easy to install to any repo as a filter https://github.com/jond3k/ipynb_stripout

@kynan
Copy link

kynan commented Sep 26, 2015

@jond3k Have a look at my repo linked above: it works with v3 and v4 and has an install command to automate the installation in any git repo.

@minrk
Copy link
Author

minrk commented Dec 21, 2015

@kynan feel free to put it on PyPI. No need to wait for me.

@kynan
Copy link

kynan commented Jan 21, 2016

@minrk OK, will do, thanks!

@kynan
Copy link

kynan commented Jan 21, 2016

@minrk Turns out @mforbes beat me to it. We need to decide on a license. Are you happy with MIT?

@klieret
Copy link

klieret commented Oct 31, 2018

Great snippet, thanks a lot for sharing!

Two suggestions:

  1. Small fix: I guess it should be grep '\.ipynb$' with the . escaped, else it will match anything
  2. Also add | tr -d '\000' | before grep: NBS=`git diff-index -z --cached $against --name-only | tr -d '\000' | grep '\.ipynb$' | uniq

The second point is because there will be cases where grep considers the input binary (https://unix.stackexchange.com/questions/19907/what-makes-grep-consider-a-file-to-be-binary). This happens to me when using zsh (i.e. getting Binary file (standard input) matches from grep instead of the matchiing parts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment