Create a gist now

Instantly share code, notes, and snippets.

Embed
Flatten notebooks for git diff

Copy nbflatten.py to somewhere on $PATH. Then, in the root of a git repository, run these commands:

echo "*.ipynb diff=ipynb" >> .gitattributes 
git config diff.ipynb.textconv nbflatten.py

When you change a notebook and run git diff, you'll see the diff of flattened, simplified notebooks, rather than the full JSON. This does lose some information (metadata, non-text output), but it makes it easier to see simple changes in the notebook.

This doesn't help with merging conflicting changes in notebooks. For that, see nbdiff.org.

#!/usr/bin/python3
import sys
from IPython.nbformat.current import read
from IPython.utils.text import strip_ansi
fname = sys.argv[1]
with open(fname, encoding='utf-8') as f:
nb = read(f, 'ipynb')
banners = {
'heading': 'Heading %d ------------------',
'markdown': 'Markdown cell ---------------',
'code': 'Code cell -------------------',
'raw': 'Raw cell --------------------',
'output': 'Output ----------------------',
}
for cell in nb.worksheets[0].cells:
if cell.cell_type == 'heading':
print(banners['heading'] % cell.level)
else:
print(banners[cell.cell_type])
if cell.cell_type == 'code':
source = cell.input
else:
source = cell.source
print(source)
if not source.endswith('\n'):
print()
if cell.cell_type == 'code':
if cell.outputs:
print(banners['output'])
for output in cell.outputs:
if 'text' in output:
print(strip_ansi(output.text))
elif 'traceback' in output:
print(strip_ansi('\n'.join(output.traceback)))
else:
print("(Non-plaintext output)")
print()
@michaelaye

This comment has been minimized.

Show comment
Hide comment
@michaelaye

michaelaye Sep 8, 2014

I get an error.
Here's the end of git config -l to show I have the right line in there:

ranch.new_offsets.merge=refs/heads/new_offsets
branch.feature/output_formatter.remote=origin
branch.feature/output_formatter.merge=refs/heads/feature/output_formatter
diff.ipynb.textconv=nbflatten.py

Here's the content of .gitattributes:

maye@lunatic|~/Dropbox/src/diviner on develop!
± cat .gitattributes
*.ipynb diff=ipynb
maye@lunatic|~/Dropbox/src/diviner on develop!

Here's my try to use it:

± git diff notebooks/analyses/Physics.ipynb
error: cannot run nbflatten.py: No such file or directory
fatal: unable to read files to diff
-> [128]
maye@lunatic|~/Dropbox/src/diviner on develop!

Here's the content of my $HOME/bin which is on the PATH:

± ll ~/bin
total 40
-rwxr--r--  1 maye  staff   1.2K Sep  8 15:24 nbflatten.py
-rwx------  1 maye  staff   116B Oct  5  2012 printpath
-rwxr-xr-x  1 maye  staff   1.2K Sep 28  2012 ssh-copy-id
lrwxr-xr-x  1 maye  staff    62B Apr 21 18:19 subl -> /Applications/Sublime Text.app/Contents/SharedSupport/bin/subl
lrwxr-xr-x  1 maye  staff    37B Nov 20  2012 vcprompt -> /Users/maye/src/vcprompt/bin/vcprompt

I get an error.
Here's the end of git config -l to show I have the right line in there:

ranch.new_offsets.merge=refs/heads/new_offsets
branch.feature/output_formatter.remote=origin
branch.feature/output_formatter.merge=refs/heads/feature/output_formatter
diff.ipynb.textconv=nbflatten.py

Here's the content of .gitattributes:

maye@lunatic|~/Dropbox/src/diviner on develop!
± cat .gitattributes
*.ipynb diff=ipynb
maye@lunatic|~/Dropbox/src/diviner on develop!

Here's my try to use it:

± git diff notebooks/analyses/Physics.ipynb
error: cannot run nbflatten.py: No such file or directory
fatal: unable to read files to diff
-> [128]
maye@lunatic|~/Dropbox/src/diviner on develop!

Here's the content of my $HOME/bin which is on the PATH:

± ll ~/bin
total 40
-rwxr--r--  1 maye  staff   1.2K Sep  8 15:24 nbflatten.py
-rwx------  1 maye  staff   116B Oct  5  2012 printpath
-rwxr-xr-x  1 maye  staff   1.2K Sep 28  2012 ssh-copy-id
lrwxr-xr-x  1 maye  staff    62B Apr 21 18:19 subl -> /Applications/Sublime Text.app/Contents/SharedSupport/bin/subl
lrwxr-xr-x  1 maye  staff    37B Nov 20  2012 vcprompt -> /Users/maye/src/vcprompt/bin/vcprompt
@michaelaye

This comment has been minimized.

Show comment
Hide comment
@michaelaye

michaelaye Sep 8, 2014

Very mysterious: Adding the full path results in git lying to me:

diff.ipynb.textconv=/Users/maye/bin/nbflatten.py
maye@lunatic|~/Dropbox/src/diviner on develop!
± git diff notebooks/analyses/Physics.ipynb
error: cannot run /Users/maye/bin/nbflatten.py: No such file or directory
fatal: unable to read files to diff
-> [128]
maye@lunatic|~/Dropbox/src/diviner on develop!
± ll /Users/maye/bin/nbflatten.py
-rwxr--r--  1 maye  staff   1.2K Sep  8 15:24 /Users/maye/bin/nbflatten.py
maye@lunatic|~/Dropbox/src/diviner on develop!

Very mysterious: Adding the full path results in git lying to me:

diff.ipynb.textconv=/Users/maye/bin/nbflatten.py
maye@lunatic|~/Dropbox/src/diviner on develop!
± git diff notebooks/analyses/Physics.ipynb
error: cannot run /Users/maye/bin/nbflatten.py: No such file or directory
fatal: unable to read files to diff
-> [128]
maye@lunatic|~/Dropbox/src/diviner on develop!
± ll /Users/maye/bin/nbflatten.py
-rwxr--r--  1 maye  staff   1.2K Sep  8 15:24 /Users/maye/bin/nbflatten.py
maye@lunatic|~/Dropbox/src/diviner on develop!
@holdenweb

This comment has been minimized.

Show comment
Hide comment
@holdenweb

holdenweb Sep 9, 2014

Make it executable? Oh, sorry, it is. Do you have a /usr/bin/python3? I believe you'll see "No such file or directory" if the kernel can't find the executable named in the shebang line. Somethimes this happens with DOS-style files when the carriage return is taken as part of the filename.

Make it executable? Oh, sorry, it is. Do you have a /usr/bin/python3? I believe you'll see "No such file or directory" if the kernel can't find the executable named in the shebang line. Somethimes this happens with DOS-style files when the carriage return is taken as part of the filename.

@gforsyth

This comment has been minimized.

Show comment
Hide comment
@gforsyth

gforsyth Sep 11, 2014

@takluyver

This is awesome. I backported it for Python 2.7 (that sounds much grander than changing 2 lines) and it's already saving me a number of headaches.

@takluyver

This is awesome. I backported it for Python 2.7 (that sounds much grander than changing 2 lines) and it's already saving me a number of headaches.

@ethanwhite

This comment has been minimized.

Show comment
Hide comment
@ethanwhite

ethanwhite Sep 14, 2014

Getting the following error using IPython 2.2.0 on Ubuntu 14.04:

ethan@oryx:~/ProgBio/repo (gh-pages *)$ git diff ipynbs/functions-writing.ipynb
Traceback (most recent call last):
  File "/usr/local/bin/nbflatten.py", line 4, in <module>
    from IPython.utils.text import strip_ansi
ImportError: cannot import name 'strip_ansi'
fatal: unable to read files to diff

Getting the following error using IPython 2.2.0 on Ubuntu 14.04:

ethan@oryx:~/ProgBio/repo (gh-pages *)$ git diff ipynbs/functions-writing.ipynb
Traceback (most recent call last):
  File "/usr/local/bin/nbflatten.py", line 4, in <module>
    from IPython.utils.text import strip_ansi
ImportError: cannot import name 'strip_ansi'
fatal: unable to read files to diff
@ethanwhite

This comment has been minimized.

Show comment
Hide comment
@ethanwhite

ethanwhite Sep 17, 2014

Looks like in the current release (2.2.0) line 4 should still be from IPython.nbconvert.filters.ansi import strip_ansi. Change is in ipython/ipython@d2acc30

Looks like in the current release (2.2.0) line 4 should still be from IPython.nbconvert.filters.ansi import strip_ansi. Change is in ipython/ipython@d2acc30

@ethanwhite

This comment has been minimized.

Show comment
Hide comment
@ethanwhite

ethanwhite Sep 17, 2014

Oh, and now that this works, it is awesome!

Oh, and now that this works, it is awesome!

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Oct 30, 2014

Oh, I wasn't getting pinged by comments on here for some reason. I'm glad it's helping people - if anyone is still having trouble with it, let me know.

Owner

takluyver commented Oct 30, 2014

Oh, I wasn't getting pinged by comments on here for some reason. I'm glad it's helping people - if anyone is still having trouble with it, let me know.

@jfeist

This comment has been minimized.

Show comment
Hide comment
@jfeist

jfeist Mar 19, 2015

Just in case it might be useful to someone:

I have been a big fan of nbflatten.py since I discovered it, and have been using it extensively as a diff filter for git. However, I find it to be a bit slow, especially for repositories with many (large) notebooks. So I spent a bit of time writing a filter for jq which does the same thing, but is orders of magnitude faster.

The relevant section of my .gitconfig now looks like this:

[diff "ipynb"]
    textconv = "jq -r 'def banner: \"\\(.) \"+(28-(.|length))*\"-\"; (\"Non-cell info\"|banner),del(.cells),\"\", (.cells[] | (\"\\(.cell_type) cell\"|banner), \"\\(.source|add)\\n\")'"

I am typically not interested in the outputs for diffing notebooks, so the textconv filter here does not show them. However, I did find it to be more convenient for me to show the metadata of the notebook as well in the output, and everything not in "cells" is shown first under the header "Non-cell info". This disappears by removing the part (\"Non-cell info\"|banner),del(.cells),\"\",.

I have also written a more "complete" script which shows the outputs in pretty much the same way as nbflatten.py. That can be found at https://gist.github.com/jfeist/cd00aa3b681092e1d5dc. If you download it and put it somewhere in your path, you can use textconv = nbflatten.jq instead.

jfeist commented Mar 19, 2015

Just in case it might be useful to someone:

I have been a big fan of nbflatten.py since I discovered it, and have been using it extensively as a diff filter for git. However, I find it to be a bit slow, especially for repositories with many (large) notebooks. So I spent a bit of time writing a filter for jq which does the same thing, but is orders of magnitude faster.

The relevant section of my .gitconfig now looks like this:

[diff "ipynb"]
    textconv = "jq -r 'def banner: \"\\(.) \"+(28-(.|length))*\"-\"; (\"Non-cell info\"|banner),del(.cells),\"\", (.cells[] | (\"\\(.cell_type) cell\"|banner), \"\\(.source|add)\\n\")'"

I am typically not interested in the outputs for diffing notebooks, so the textconv filter here does not show them. However, I did find it to be more convenient for me to show the metadata of the notebook as well in the output, and everything not in "cells" is shown first under the header "Non-cell info". This disappears by removing the part (\"Non-cell info\"|banner),del(.cells),\"\",.

I have also written a more "complete" script which shows the outputs in pretty much the same way as nbflatten.py. That can be found at https://gist.github.com/jfeist/cd00aa3b681092e1d5dc. If you download it and put it somewhere in your path, you can use textconv = nbflatten.jq instead.

@jfeist

This comment has been minimized.

Show comment
Hide comment
@jfeist

jfeist Mar 19, 2015

PS: jq is also very useful (and fast!) for making a filter to remove the output of notebooks when adding them to git. The relevant part of .gitconfig is

[filter "clean_nb"]
    clean = "jq '(.cells[] | select(.outputs) ) |= [] | (.cells[] | select(.execution_count)) |= null'"

And in .gitattributes, you then need *.ipynb filter=clean_nb diff=ipynb.

jfeist commented Mar 19, 2015

PS: jq is also very useful (and fast!) for making a filter to remove the output of notebooks when adding them to git. The relevant part of .gitconfig is

[filter "clean_nb"]
    clean = "jq '(.cells[] | select(.outputs) ) |= [] | (.cells[] | select(.execution_count)) |= null'"

And in .gitattributes, you then need *.ipynb filter=clean_nb diff=ipynb.

@jakirkham

This comment has been minimized.

Show comment
Hide comment
@jakirkham

jakirkham Jun 12, 2015

@jfeist, using jq is fantastic. This really saved me a lot of time!

@jfeist, using jq is fantastic. This really saved me a lot of time!

@jakirkham

This comment has been minimized.

Show comment
Hide comment
@jakirkham

jakirkham Aug 21, 2015

Modification to @jfeist's snippet for .gitconfig. More details here ( stedolan/jq#921 ). Also, double quotes must be escaped in single quotes with .gitconfig. ( http://stackoverflow.com/a/25535431 )

[filter "clean_nb"]
        clean = jq '(.cells[] | select(has(\"outputs\")) | .outputs) = [] | (.cells[] | select(has(\"execution_count\")) | .execution_count) = null'

Modification to @jfeist's snippet for .gitconfig. More details here ( stedolan/jq#921 ). Also, double quotes must be escaped in single quotes with .gitconfig. ( http://stackoverflow.com/a/25535431 )

[filter "clean_nb"]
        clean = jq '(.cells[] | select(has(\"outputs\")) | .outputs) = [] | (.cells[] | select(has(\"execution_count\")) | .execution_count) = null'
@edgimar

This comment has been minimized.

Show comment
Hide comment
@edgimar

edgimar Sep 21, 2015

In case anyone cares, newer versions of ipython have a "nbconvert" function built into them, so you can do something like ipython nbconvert myfile.ipynb --to markdown --stdout and get a similar effect to this script. Otherwise you will need to mess around with the nbflatten script in order to get it to work with recent versions of ipython.

edgimar commented Sep 21, 2015

In case anyone cares, newer versions of ipython have a "nbconvert" function built into them, so you can do something like ipython nbconvert myfile.ipynb --to markdown --stdout and get a similar effect to this script. Otherwise you will need to mess around with the nbflatten script in order to get it to work with recent versions of ipython.

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Jan 14, 2016

Has someone here use jq as a nbflatten replacement (with [diff "ipynb"], not [filter ...]!) on windows? I tried and jq crashes even on jq "." whatever.ipnb

Has someone here use jq as a nbflatten replacement (with [diff "ipynb"], not [filter ...]!) on windows? I tried and jq crashes even on jq "." whatever.ipnb

@nicowilliams

This comment has been minimized.

Show comment
Hide comment
@nicowilliams

nicowilliams Jan 14, 2016

Re: jq, @janschulz filed stedolan/jq#1072, an it's a fun one.

Re: jq, @janschulz filed stedolan/jq#1072, an it's a fun one.

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Jan 26, 2016

JFYI: with a recent build of jq, the jq version of nbflatten and the filter now works on windows.

JFYI: with a recent build of jq, the jq version of nbflatten and the filter now works on windows.

@vmuriart

This comment has been minimized.

Show comment
Hide comment
@vmuriart

vmuriart Apr 6, 2016

For anyone looking to download the version @janschulz was referring to its on AppVeyor

Thanks @janschulz for the heads up

vmuriart commented Apr 6, 2016

For anyone looking to download the version @janschulz was referring to its on AppVeyor

Thanks @janschulz for the heads up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment