Skip to content

Instantly share code, notes, and snippets.

@takluyver
Created September 6, 2014 21:44
Show Gist options
  • Save takluyver/bc8f3275c7d34abb68bf to your computer and use it in GitHub Desktop.
Save takluyver/bc8f3275c7d34abb68bf to your computer and use it in GitHub Desktop.
Flatten notebooks for git diff

Copy nbflatten.py to somewhere on $PATH. Then, in the root of a git repository, run these commands:

echo "*.ipynb diff=ipynb" >> .gitattributes 
git config diff.ipynb.textconv nbflatten.py

When you change a notebook and run git diff, you'll see the diff of flattened, simplified notebooks, rather than the full JSON. This does lose some information (metadata, non-text output), but it makes it easier to see simple changes in the notebook.

This doesn't help with merging conflicting changes in notebooks. For that, see nbdiff.org.

#!/usr/bin/python3
import sys
from IPython.nbformat.current import read
from IPython.utils.text import strip_ansi
fname = sys.argv[1]
with open(fname, encoding='utf-8') as f:
nb = read(f, 'ipynb')
banners = {
'heading': 'Heading %d ------------------',
'markdown': 'Markdown cell ---------------',
'code': 'Code cell -------------------',
'raw': 'Raw cell --------------------',
'output': 'Output ----------------------',
}
for cell in nb.worksheets[0].cells:
if cell.cell_type == 'heading':
print(banners['heading'] % cell.level)
else:
print(banners[cell.cell_type])
if cell.cell_type == 'code':
source = cell.input
else:
source = cell.source
print(source)
if not source.endswith('\n'):
print()
if cell.cell_type == 'code':
if cell.outputs:
print(banners['output'])
for output in cell.outputs:
if 'text' in output:
print(strip_ansi(output.text))
elif 'traceback' in output:
print(strip_ansi('\n'.join(output.traceback)))
else:
print("(Non-plaintext output)")
print()
@jakirkham
Copy link

@jfeist, using jq is fantastic. This really saved me a lot of time!

@jakirkham
Copy link

Modification to @jfeist's snippet for .gitconfig. More details here ( jqlang/jq#921 ). Also, double quotes must be escaped in single quotes with .gitconfig. ( http://stackoverflow.com/a/25535431 )

[filter "clean_nb"]
        clean = jq '(.cells[] | select(has(\"outputs\")) | .outputs) = [] | (.cells[] | select(has(\"execution_count\")) | .execution_count) = null'

@edgimar
Copy link

edgimar commented Sep 21, 2015

In case anyone cares, newer versions of ipython have a "nbconvert" function built into them, so you can do something like ipython nbconvert myfile.ipynb --to markdown --stdout and get a similar effect to this script. Otherwise you will need to mess around with the nbflatten script in order to get it to work with recent versions of ipython.

@jankatins
Copy link

Has someone here use jq as a nbflatten replacement (with [diff "ipynb"], not [filter ...]!) on windows? I tried and jq crashes even on jq "." whatever.ipnb

@nicowilliams
Copy link

Re: jq, @JanSchulz filed jqlang/jq#1072, an it's a fun one.

@jankatins
Copy link

JFYI: with a recent build of jq, the jq version of nbflatten and the filter now works on windows.

@vmuriart
Copy link

vmuriart commented Apr 6, 2016

For anyone looking to download the version @JanSchulz was referring to its on AppVeyor

Thanks @JanSchulz for the heads up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment