Skip to content

Instantly share code, notes, and snippets.

@takluyver
Created September 6, 2014 21:44
Show Gist options
  • Save takluyver/bc8f3275c7d34abb68bf to your computer and use it in GitHub Desktop.
Save takluyver/bc8f3275c7d34abb68bf to your computer and use it in GitHub Desktop.
Flatten notebooks for git diff

Copy nbflatten.py to somewhere on $PATH. Then, in the root of a git repository, run these commands:

echo "*.ipynb diff=ipynb" >> .gitattributes 
git config diff.ipynb.textconv nbflatten.py

When you change a notebook and run git diff, you'll see the diff of flattened, simplified notebooks, rather than the full JSON. This does lose some information (metadata, non-text output), but it makes it easier to see simple changes in the notebook.

This doesn't help with merging conflicting changes in notebooks. For that, see nbdiff.org.

#!/usr/bin/python3
import sys
from IPython.nbformat.current import read
from IPython.utils.text import strip_ansi
fname = sys.argv[1]
with open(fname, encoding='utf-8') as f:
nb = read(f, 'ipynb')
banners = {
'heading': 'Heading %d ------------------',
'markdown': 'Markdown cell ---------------',
'code': 'Code cell -------------------',
'raw': 'Raw cell --------------------',
'output': 'Output ----------------------',
}
for cell in nb.worksheets[0].cells:
if cell.cell_type == 'heading':
print(banners['heading'] % cell.level)
else:
print(banners[cell.cell_type])
if cell.cell_type == 'code':
source = cell.input
else:
source = cell.source
print(source)
if not source.endswith('\n'):
print()
if cell.cell_type == 'code':
if cell.outputs:
print(banners['output'])
for output in cell.outputs:
if 'text' in output:
print(strip_ansi(output.text))
elif 'traceback' in output:
print(strip_ansi('\n'.join(output.traceback)))
else:
print("(Non-plaintext output)")
print()
@ethanwhite
Copy link

Oh, and now that this works, it is awesome!

@takluyver
Copy link
Author

Oh, I wasn't getting pinged by comments on here for some reason. I'm glad it's helping people - if anyone is still having trouble with it, let me know.

@jfeist
Copy link

jfeist commented Mar 19, 2015

Just in case it might be useful to someone:

I have been a big fan of nbflatten.py since I discovered it, and have been using it extensively as a diff filter for git. However, I find it to be a bit slow, especially for repositories with many (large) notebooks. So I spent a bit of time writing a filter for jq which does the same thing, but is orders of magnitude faster.

The relevant section of my .gitconfig now looks like this:

[diff "ipynb"]
    textconv = "jq -r 'def banner: \"\\(.) \"+(28-(.|length))*\"-\"; (\"Non-cell info\"|banner),del(.cells),\"\", (.cells[] | (\"\\(.cell_type) cell\"|banner), \"\\(.source|add)\\n\")'"

I am typically not interested in the outputs for diffing notebooks, so the textconv filter here does not show them. However, I did find it to be more convenient for me to show the metadata of the notebook as well in the output, and everything not in "cells" is shown first under the header "Non-cell info". This disappears by removing the part (\"Non-cell info\"|banner),del(.cells),\"\",.

I have also written a more "complete" script which shows the outputs in pretty much the same way as nbflatten.py. That can be found at https://gist.github.com/jfeist/cd00aa3b681092e1d5dc. If you download it and put it somewhere in your path, you can use textconv = nbflatten.jq instead.

@jfeist
Copy link

jfeist commented Mar 19, 2015

PS: jq is also very useful (and fast!) for making a filter to remove the output of notebooks when adding them to git. The relevant part of .gitconfig is

[filter "clean_nb"]
    clean = "jq '(.cells[] | select(.outputs) ) |= [] | (.cells[] | select(.execution_count)) |= null'"

And in .gitattributes, you then need *.ipynb filter=clean_nb diff=ipynb.

@jakirkham
Copy link

@jfeist, using jq is fantastic. This really saved me a lot of time!

@jakirkham
Copy link

Modification to @jfeist's snippet for .gitconfig. More details here ( jqlang/jq#921 ). Also, double quotes must be escaped in single quotes with .gitconfig. ( http://stackoverflow.com/a/25535431 )

[filter "clean_nb"]
        clean = jq '(.cells[] | select(has(\"outputs\")) | .outputs) = [] | (.cells[] | select(has(\"execution_count\")) | .execution_count) = null'

@edgimar
Copy link

edgimar commented Sep 21, 2015

In case anyone cares, newer versions of ipython have a "nbconvert" function built into them, so you can do something like ipython nbconvert myfile.ipynb --to markdown --stdout and get a similar effect to this script. Otherwise you will need to mess around with the nbflatten script in order to get it to work with recent versions of ipython.

@jankatins
Copy link

Has someone here use jq as a nbflatten replacement (with [diff "ipynb"], not [filter ...]!) on windows? I tried and jq crashes even on jq "." whatever.ipnb

@nicowilliams
Copy link

Re: jq, @JanSchulz filed jqlang/jq#1072, an it's a fun one.

@jankatins
Copy link

JFYI: with a recent build of jq, the jq version of nbflatten and the filter now works on windows.

@vmuriart
Copy link

vmuriart commented Apr 6, 2016

For anyone looking to download the version @JanSchulz was referring to its on AppVeyor

Thanks @JanSchulz for the heads up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment