Instantly share code, notes, and snippets.

Embed
What would you like to do?
Keeping IPython notebooks under Git version control

This gist lets you keep IPython notebooks in git repositories. It tells git to ignore prompt numbers and program outputs when checking that a file has changed.

To use the script, follow the instructions given in the script's docstring.

For further details, read this blogpost.

The procedure outlined here is inspired by this answer on Stack Overflow.

#!/usr/bin/env python
"""
Suppress output and prompt numbers in git version control.
This script will tell git to ignore prompt numbers and cell output
when looking at ipynb files if their metadata contains:
"git" : { "suppress_output" : true }
The notebooks themselves are not changed.
See also this blogpost: http://pascalbugnion.net/blog/ipython-notebooks-and-git.html.
Usage instructions
==================
1. Put this script in a directory that is on the system's path.
For future reference, I will assume you saved it in
`~/scripts/ipynb_drop_output`.
2. Make sure it is executable by typing the command
`chmod +x ~/scripts/ipynb_drop_output`.
3. Register a filter for ipython notebooks by
putting the following line in `~/.config/git/attributes`:
`*.ipynb filter=clean_ipynb`
4. Connect this script to the filter by running the following
git commands:
git config --global filter.clean_ipynb.clean ipynb_drop_output
git config --global filter.clean_ipynb.smudge cat
To tell git to ignore the output and prompts for a notebook,
open the notebook's metadata (Edit > Edit Notebook Metadata). A
panel should open containing the lines:
{
"name" : "",
"signature" : "some very long hash"
}
Add an extra line so that the metadata now looks like:
{
"name" : "",
"signature" : "don't change the hash, but add a comma at the end of the line",
"git" : { "suppress_outputs" : true }
}
You may need to "touch" the notebooks for git to actually register a change, if
your notebooks are already under version control.
Notes
=====
This script is inspired by http://stackoverflow.com/a/20844506/827862, but
lets the user specify whether the ouptut of a notebook should be suppressed
in the notebook's metadata, and works for IPython v3.0.
"""
import sys
import json
nb = sys.stdin.read()
json_in = json.loads(nb)
nb_metadata = json_in["metadata"]
suppress_output = False
if "git" in nb_metadata:
if "suppress_outputs" in nb_metadata["git"] and nb_metadata["git"]["suppress_outputs"]:
suppress_output = True
if not suppress_output:
sys.stdout.write(nb)
exit()
ipy_version = int(json_in["nbformat"])-1 # nbformat is 1 more than actual version.
def strip_output_from_cell(cell):
if "outputs" in cell:
cell["outputs"] = []
if "prompt_number" in cell:
del cell["prompt_number"]
if ipy_version == 2:
for sheet in json_in["worksheets"]:
for cell in sheet["cells"]:
strip_output_from_cell(cell)
else:
for cell in json_in["cells"]:
strip_output_from_cell(cell)
json.dump(json_in, sys.stdout, sort_keys=True, indent=1, separators=(",",": "))
@LonghornRach

This comment has been minimized.

LonghornRach commented Jul 15, 2015

What would be a reason for you not to run the filter on a notebook? (Like what would an example use case be?)

@jodybrookover

This comment has been minimized.

jodybrookover commented Jul 20, 2015

If you wanted to keep the data/output.

@EvdH0

This comment has been minimized.

EvdH0 commented Sep 25, 2015

According to https://ipython.org/ipython-doc/dev/notebook/nbformat.html#code-cells in v 4.0
prompt_number renamed to execution_count
so 81-82 should be
if "execution_count" in cell:
del cell["execution_count"]

@kkumer

This comment has been minimized.

kkumer commented Oct 9, 2015

Actually, if you delete execution_count cell, jupyter will complain. What works for me is

if "execution_count" in cell:                                                  
    cell["execution_count"] = None 
@rschutjens

This comment has been minimized.

rschutjens commented Mar 13, 2016

Is it correct that this does not restore the output if you checkout an earlier versions of the notebook? That would be what I want, only the latest commit would have full ipynb in git (and after I push it to github).

I am new to notebooks, git, and github so sorry if this might be an obvious answer.

@jibe-b

This comment has been minimized.

jibe-b commented Mar 21, 2016

Here is an update for the ipynb format, version 4 :

import sys
import json

nb = sys.stdin.read()

json_in = json.loads(nb)

nb_metadata = json_in["metadata"]
suppress_output = False
if "git" in nb_metadata:
    if "suppress_outputs" in nb_metadata["git"] and nb_metadata["git"]["suppress_outputs"]:
        suppress_output = True
if not suppress_output:
    sys.stdout.write(nb)
    exit()

def strip_output_from_cell(cell):
    if "outputs" in cell:
        cell["outputs"] = []
    if "execution_count" in cell:
        del cell["execution_count"]

for cell in json_in["cells"]:
    strip_output_from_cell(cell)

json.dump(json_in, sys.stdout, sort_keys=True, indent=1, separators=(",",": "))
@jibe-b

This comment has been minimized.

jibe-b commented Mar 21, 2016

only the latest commit would have full ipynb in git (and after I push it to github)

@rschutjens I am not sure to understand what you mean but it seems it is a bit different from what you expect:

  • the script creates a temporary version of the notebook without cell outputs, while the original file is left unchanged
  • this temporary version is added to the stage, and then add... commit... push{bash} makes a commit with the version without outputs.

So when you checkout, you never get the outputs.

Welcome to notebooks and git, hope you enjoy it!

@Garoe

This comment has been minimized.

Garoe commented Nov 9, 2016

Latest version complained about "execution_count" not being defined, changing @jibe-b version
del cell["execution_count"]
to
cell["execution_count"] = None
fixed the issue

@miketrumpis

This comment has been minimized.

miketrumpis commented Dec 5, 2016

Thanks for the tool! I don't know if this is a json version quirk, but I needed to add a newline to the end of stdout to keep git happy.

json.dump(json_in, sys.stdout, sort_keys=True, indent=1, separators=(",",": "))
sys.stdout.write('\n')

@tonysherbondy

This comment has been minimized.

tonysherbondy commented Jan 5, 2017

Also, if you use unicode in your notebooks, you probably want to let json.dump keep those with this:

json.dump(json_in, sys.stdout, sort_keys=True, indent=1, separators=(",",": "), ensure_ascii=False)
@korakotlee

This comment has been minimized.

korakotlee commented Mar 19, 2017

on Linux got this error
error: cannot run ipynb_drop_output: No such file or directory

so i went into .gitconfig and add .py suffix

@larister

This comment has been minimized.

larister commented Aug 30, 2017

Thanks @korakotlee, that saved me a bit of head scratching

@bw-matthew

This comment has been minimized.

bw-matthew commented Aug 30, 2017

With jq you can implement this entirely inside the git configuration, which then makes it easy to turn it on for a specific repo.

Add the following to .git/config:

[filter "clean_ipynb"]
    clean = jq '{ cells: [.cells[] | . + { metadata: {} } + if .cell_type == \"code\" then { outputs: [], execution_count: null } else {} end ] } + delpaths([[\"cells\"]])'
    smudge = cat

Create the file .git/info/attributes with the content from above:

*.ipynb  filter=clean_ipynb

This does not conditionally prevent the formatting of the notebooks using the metadata. I'm sure it would be possible to check for this by wrapping the overall statement in an if.

@matthewfranglen

This comment has been minimized.

matthewfranglen commented Aug 30, 2017

An improvement to the jq approach is this:

[filter "clean_ipynb"]
    clean = jq --indent 1 --monochrome-output '. + if .metadata.git.suppress_outputs | not then { cells: [.cells[] | . + if .cell_type == \"code\" then { outputs: [], execution_count: null } else {} end ] } else {} end'
    smudge = cat

It respects the .metadata.git.suppress_outputs in the same way as the python scripts (will not alter if the value is present and truthy). It also matches the indentation I have observed.

@mazzma12

This comment has been minimized.

mazzma12 commented Sep 6, 2017

3. Register a filter for ipython notebooks by
   putting the following line in `~/.config/git/attributes`:
   `*.ipynb  filter=clean_ipynb`

use this command to generate gitattributes file if you do not have any

touch ~/.gitattributes
git config --global core.attributesfile ~/.gitattributes
@rgbkrk

This comment has been minimized.

rgbkrk commented Jan 11, 2018

Something you may want to try out now is the official jupyter tool, nbdime -- it does similar git integration and quite a bit more.

@rightx2

This comment has been minimized.

rightx2 commented Jan 26, 2018

works well in Mac OS X but not in Ubuntu 16.04. . . . The only difference is the existance of ~/.config/git folder. This directory exists in Mac OS X, but not in Ubuntu 16.04. So I aritficially made this directory and create attribute file too, but it doesn't work

@chetgray

This comment has been minimized.

chetgray commented Jun 8, 2018

In the initial overview part of the script's docstring, the example metadata should read

"git" : { "suppress_outputs" : true }

"suppress_output", without the final s, is currently the key documented there (elsewhere it's correctly documented "suppress_outputs").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment