Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 119 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save 33eyes/431e3d432f73371509d176d0dfb95b6e to your computer and use it in GitHub Desktop.
Save 33eyes/431e3d432f73371509d176d0dfb95b6e to your computer and use it in GitHub Desktop.
How to commit jupyter notebooks without output to git while keeping the notebooks outputs intact locally
  1. Add a filter to git config by running the following command in bash inside the repo:
git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'  
  1. Create a .gitattributes file inside the directory with the notebooks

  2. Add the following to that file:

*.ipynb filter=strip-notebook-output  

After that, commit to git as usual. The notebook output will be stripped out in git commits, but it will remain unchanged locally.

This gist is based on @dirkjot's answer to this StackOverflow question.

@spinicist
Copy link

Hello - thanks for this, I really needed it.

On my system (Mac, Python 3.9.9 installed via brew) I needed jupyter-nbconvert instead of jupyter nbconvert.

@33eyes
Copy link
Author

33eyes commented Jan 24, 2022

👍 Interesting, must be a brew thing. My python install was by conda, and everything else's the same.

@derailed-dash
Copy link

derailed-dash commented Mar 4, 2022

Wow, this worked a treat!

I tend to run Jupyter in a Docker container. So, if you want to run the Notebook output cleaning preprocessor using a Jupyter container instead, all you need to do is tweak the git config like this:

git config --global filter.strip-notebook-output.clean 'docker run --rm -i -v \"${pwd}:/home/jovyan\" jupyter/datascience-notebook jupyter nbconvert --clear-output --to=notebook --stdin --stdout'

I've set it as a global config parameter, so that it applies to all repos.

@Talismanic
Copy link

Mine one was also conda install and python 3.6. It worked like charm Thanks a lot.

@jazzlw
Copy link

jazzlw commented Feb 16, 2023

wow, thanks! this is the best!

@konradmb
Copy link

Based on this StackOverflow answer, I've added required field (git will fail if cleaning errors out) and moved .gitattributes to internal config.

  1. Add to .git/config:
    [filter "strip-notebook-output"]
    clean = "jupyter nbconvert --ClearOutputPreprocessor.enabled=True --ClearMetadataPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR"
    smudge = "cat"
    required
  2. Create .git/info/attributes and add:
    *.ipynb filter=strip-notebook-output

@air-kyi
Copy link

air-kyi commented Aug 15, 2023

I am having trouble getting the filter to actually activate. Does anyone know what to do in this case? I can only get it working on one device, but not the same repository on a different device, for two different repositories. I am adding the .gitattributes and .git/config on both different devices.

Working on: Python 3.11.4, conda 23.5.2, git 23.9.1.windows.1
Not working on: Python 3.10.11, conda 23.3.1, git 2.41.0.windows.3

Troubleshooting
It’s possible that despite the above changes the “old” data still shows and the filter is not applied. Here are a couple of things I’ve tried:

First, double-check that all steps, names, and paths are correct.

Second, try deleting and restoring the file(s) affected by the filter. I would delete the file directly (e.g. go into the Finder and Trash the file), then use git (e.g. git reset) to restore the file via a git mechanism. This should trigger git’s hook to apply filters.

If there are still problems, or you want to learn more nitty-gritty about git attribute keyword expansion support (what “git smudge and clean” is all about), you can check the official documentation: “Customizing Git Attributes: Keyword Expansion”.

Another source says I have to do git add --renormalize:

Whenever you change the clean filter, you have to renormalise your repository:

$ git add --renormalize .
After these changes, git will run automatically the command to clear output cells on every Jupyter notebook file added to the staging area.

But this didn't work for me, git bash didn't like the jupyter command ('unrecognized command') since git and jupyter are not in the same environment.

@konradmb
Copy link

@air-kyi Check if you can run jupyter nbconvert command manually. Also remember that filters are not a part of a repository. You need to add them manually on each device.

@air-kyi
Copy link

air-kyi commented Aug 15, 2023

I did add them manually on each device; I tried running jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR manually in conda and it didn't do anything or loaded forever without error message, and the same command in git bash gave 'unrecognized command' (I wouldn't really expect it to run in git since jupyter is not associated with git?)

@konradmb
Copy link

It won't do anything because it waits for stdin input.
I don't have experience with conda, but if it works like venv, then that's where the problem is.

You'd need to install jupyter globally (easier) or modify filter to activate environment before running (harder). Try to run pip install nbconvert or python3 -m pip install nbconvert in git bash window. If it doesn't work: install python from python.org and remember to check "add python to PATH" or something similar. Maybe you can do it with conda, but I have zero experience with that.

@33eyes
Copy link
Author

33eyes commented Aug 16, 2023

Hey @air-kyi , sorry you are having trouble with this. Conda includes jupyter, so you should be all set with regard to jupyter installs. You don't need git bash for running non-git related commands (or even for git, for that matter), it just makes using git a bit easier, especially on Windows. For testing out nbconvert from command line, the command you ran will not work because you have a --stdin flag in it which tells it to wait for a notebook file from stdin input, as @konradmb pointed out. You can test running nbconvert from command line with this command instead: jupyter nbconvert --to html notebook.ipynb. It should output an HTML version of your notebook. If it works, then your nbconvert is probably fine.

I wrote the gist above as a little code snippet to use as a starting point that people can modify as needed for their own setup. I wrote it while working on a Mac, and judging from your git version it looks like you are on Windows. If that's the case, then you may need additional steps or some other configuration that is specific to Windows. You are welcome to fork this gist and create your own version for Windows. I would highly recommend posting a question about the errors you are getting on stackoverflow.com, and you'll get a bigger audience of people trying to help you troubleshoot your issues there. Git Gists are primarily for sharing code snippets and not so much for troubleshooting them in-depth. I think someone on StackOverflow will be able to point you in the right direction. Good luck!

@konradmb
Copy link

konradmb commented Aug 16, 2023

the command you ran will not work

@33eyes It will work, but won't output anything, just freeze. So that means jupyter command is available in PATH. But if running it from git bash gives 'unrecognized command', that means it's not in PATH anymore, so I suspect it's not in global PATH env var and that's how git would run it either when from git bash, cmd.exe, windows terminal or vscode etc. etc.

@air-kyi
Copy link

air-kyi commented Aug 30, 2023

hi everyone, I solved the issue. it's because I was in a virtual environment and needed to add that specific installation of python to PATH.

  1. in Anaconda prompt, type where jupyter or just figure out where your jupyter.exe is stored in your venv
(orf) C:\Users\kyi>where jupyter
C:\Users\kyi\AppData\Local\anaconda3\envs\orf\Scripts\jupyter.exe
  1. add this path (without the jupyter.exe part) to a new variable in PATH

@NickCrews
Copy link

NickCrews commented Sep 26, 2023

@33eyes can you add the 4th step of

  1. Run git add --renormalize . to go through all of your existing notebook files and scrub the outputs. Otherwise, you could get heinous merge conflicts later.

PS to people having PATH issues: If you are using VScode, perhaps your issue is caused by vscode running all of the git commands in a weird environment typically different from what you have on the command line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment