Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Keeping IPython notebooks under Git version control

This gist lets you keep IPython notebooks in git repositories. It tells git to ignore prompt numbers and program outputs when checking that a file has changed.

To use the script, follow the instructions given in the script's docstring.

For further details, read this blogpost.

The procedure outlined here is inspired by this answer on Stack Overflow.

#!/usr/bin/env python
"""
Suppress output and prompt numbers in git version control.
This script will tell git to ignore prompt numbers and cell output
when looking at ipynb files if their metadata contains:
"git" : { "suppress_output" : true }
The notebooks themselves are not changed.
See also this blogpost: http://pascalbugnion.net/blog/ipython-notebooks-and-git.html.
Usage instructions
==================
1. Put this script in a directory that is on the system's path.
For future reference, I will assume you saved it in
`~/scripts/ipynb_drop_output`.
2. Make sure it is executable by typing the command
`chmod +x ~/scripts/ipynb_drop_output`.
3. Register a filter for ipython notebooks by
putting the following line in `~/.config/git/attributes`:
`*.ipynb filter=clean_ipynb`
4. Connect this script to the filter by running the following
git commands:
git config --global filter.clean_ipynb.clean ipynb_drop_output
git config --global filter.clean_ipynb.smudge cat
To tell git to ignore the output and prompts for a notebook,
open the notebook's metadata (Edit > Edit Notebook Metadata). A
panel should open containing the lines:
{
"name" : "",
"signature" : "some very long hash"
}
Add an extra line so that the metadata now looks like:
{
"name" : "",
"signature" : "don't change the hash, but add a comma at the end of the line",
"git" : { "suppress_outputs" : true }
}
You may need to "touch" the notebooks for git to actually register a change, if
your notebooks are already under version control.
Notes
=====
This script is inspired by http://stackoverflow.com/a/20844506/827862, but
lets the user specify whether the ouptut of a notebook should be suppressed
in the notebook's metadata, and works for IPython v3.0.
"""
import sys
import json
nb = sys.stdin.read()
json_in = json.loads(nb)
nb_metadata = json_in["metadata"]
suppress_output = False
if "git" in nb_metadata:
if "suppress_outputs" in nb_metadata["git"] and nb_metadata["git"]["suppress_outputs"]:
suppress_output = True
if not suppress_output:
sys.stdout.write(nb)
exit()
ipy_version = int(json_in["nbformat"])-1 # nbformat is 1 more than actual version.
def strip_output_from_cell(cell):
if "outputs" in cell:
cell["outputs"] = []
if "prompt_number" in cell:
del cell["prompt_number"]
if ipy_version == 2:
for sheet in json_in["worksheets"]:
for cell in sheet["cells"]:
strip_output_from_cell(cell)
else:
for cell in json_in["cells"]:
strip_output_from_cell(cell)
json.dump(json_in, sys.stdout, sort_keys=True, indent=1, separators=(",",": "))
@LonghornRach

What would be a reason for you not to run the filter on a notebook? (Like what would an example use case be?)

@jodybrookover

If you wanted to keep the data/output.

@EvdH0
EvdH0 commented Sep 25, 2015

According to https://ipython.org/ipython-doc/dev/notebook/nbformat.html#code-cells in v 4.0
prompt_number renamed to execution_count
so 81-82 should be
if "execution_count" in cell:
del cell["execution_count"]

@kkumer
kkumer commented Oct 9, 2015

Actually, if you delete execution_count cell, jupyter will complain. What works for me is

if "execution_count" in cell:                                                  
    cell["execution_count"] = None 
@rschutjens

Is it correct that this does not restore the output if you checkout an earlier versions of the notebook? That would be what I want, only the latest commit would have full ipynb in git (and after I push it to github).

I am new to notebooks, git, and github so sorry if this might be an obvious answer.

@jibe-b
jibe-b commented Mar 21, 2016

Here is an update for the ipynb format, version 4 :

import sys
import json

nb = sys.stdin.read()

json_in = json.loads(nb)

nb_metadata = json_in["metadata"]
suppress_output = False
if "git" in nb_metadata:
    if "suppress_outputs" in nb_metadata["git"] and nb_metadata["git"]["suppress_outputs"]:
        suppress_output = True
if not suppress_output:
    sys.stdout.write(nb)
    exit()

def strip_output_from_cell(cell):
    if "outputs" in cell:
        cell["outputs"] = []
    if "execution_count" in cell:
        del cell["execution_count"]

for cell in json_in["cells"]:
    strip_output_from_cell(cell)

json.dump(json_in, sys.stdout, sort_keys=True, indent=1, separators=(",",": "))
@jibe-b
jibe-b commented Mar 21, 2016

only the latest commit would have full ipynb in git (and after I push it to github)

@rschutjens I am not sure to understand what you mean but it seems it is a bit different from what you expect:

  • the script creates a temporary version of the notebook without cell outputs, while the original file is left unchanged
  • this temporary version is added to the stage, and then add... commit... push{bash} makes a commit with the version without outputs.

So when you checkout, you never get the outputs.

Welcome to notebooks and git, hope you enjoy it!

@Garoe
Garoe commented Nov 9, 2016

Latest version complained about "execution_count" not being defined, changing @jibe-b version
del cell["execution_count"]
to
cell["execution_count"] = None
fixed the issue

@miketrumpis

Thanks for the tool! I don't know if this is a json version quirk, but I needed to add a newline to the end of stdout to keep git happy.

json.dump(json_in, sys.stdout, sort_keys=True, indent=1, separators=(",",": "))
sys.stdout.write('\n')

@tonysherbondy

Also, if you use unicode in your notebooks, you probably want to let json.dump keep those with this:

json.dump(json_in, sys.stdout, sort_keys=True, indent=1, separators=(",",": "), ensure_ascii=False)
@korakotlee

on Linux got this error
error: cannot run ipynb_drop_output: No such file or directory

so i went into .gitconfig and add .py suffix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment