Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save 33eyes/431e3d432f73371509d176d0dfb95b6e to your computer and use it in GitHub Desktop.
Save 33eyes/431e3d432f73371509d176d0dfb95b6e to your computer and use it in GitHub Desktop.
How to commit jupyter notebooks without output to git while keeping the notebooks outputs intact locally

Commit jupyter notebooks code to git and keep output locally

  1. Add a filter to git config by running the following command in bash inside the repo:
git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'  
  1. Create a .gitattributes file inside the directory with the notebooks

  2. Add the following to that file:

*.ipynb filter=strip-notebook-output  

After that, commit to git as usual. The notebook output will be stripped out in git commits, but it will remain unchanged locally.

Source: StackOverflow

How to override the above for a specific notebook

This is useful if you sometimes want to add specific notebooks with their cell outputs intact to git, while still having the default behavior of clearing out cells.

  1. When adding to git a notebook whose cell outputs you want to keep, instead of the usual git add <path to your notebook> command, use this: git -c filter.strip-notebook-output.clean= add <path to your notebook>

Source: StackOverflow

@33eyes
Copy link
Author

33eyes commented Aug 16, 2023

Hey @air-kyi , sorry you are having trouble with this. Conda includes jupyter, so you should be all set with regard to jupyter installs. You don't need git bash for running non-git related commands (or even for git, for that matter), it just makes using git a bit easier, especially on Windows. For testing out nbconvert from command line, the command you ran will not work because you have a --stdin flag in it which tells it to wait for a notebook file from stdin input, as @konradmb pointed out. You can test running nbconvert from command line with this command instead: jupyter nbconvert --to html notebook.ipynb. It should output an HTML version of your notebook. If it works, then your nbconvert is probably fine.

I wrote the gist above as a little code snippet to use as a starting point that people can modify as needed for their own setup. I wrote it while working on a Mac, and judging from your git version it looks like you are on Windows. If that's the case, then you may need additional steps or some other configuration that is specific to Windows. You are welcome to fork this gist and create your own version for Windows. I would highly recommend posting a question about the errors you are getting on stackoverflow.com, and you'll get a bigger audience of people trying to help you troubleshoot your issues there. Git Gists are primarily for sharing code snippets and not so much for troubleshooting them in-depth. I think someone on StackOverflow will be able to point you in the right direction. Good luck!

@konradmb
Copy link

konradmb commented Aug 16, 2023

the command you ran will not work

@33eyes It will work, but won't output anything, just freeze. So that means jupyter command is available in PATH. But if running it from git bash gives 'unrecognized command', that means it's not in PATH anymore, so I suspect it's not in global PATH env var and that's how git would run it either when from git bash, cmd.exe, windows terminal or vscode etc. etc.

@air-kyi
Copy link

air-kyi commented Aug 30, 2023

hi everyone, I solved the issue. it's because I was in a virtual environment and needed to add that specific installation of python to PATH.

  1. in Anaconda prompt, type where jupyter or just figure out where your jupyter.exe is stored in your venv
(orf) C:\Users\kyi>where jupyter
C:\Users\kyi\AppData\Local\anaconda3\envs\orf\Scripts\jupyter.exe
  1. add this path (without the jupyter.exe part) to a new variable in PATH

@NickCrews
Copy link

NickCrews commented Sep 26, 2023

@33eyes can you add the 4th step of

  1. Run git add --renormalize . to go through all of your existing notebook files and scrub the outputs. Otherwise, you could get heinous merge conflicts later.

PS to people having PATH issues: If you are using VScode, perhaps your issue is caused by vscode running all of the git commands in a weird environment typically different from what you have on the command line.

@jfoclpf
Copy link

jfoclpf commented May 14, 2024

It works, you're great!

@pompetardo
Copy link

It works but when trying to merge, it breaks the process as merging adds "<<<<<<<", "=======", ">>>>>>>" all over the place in the conflicting files. So if the conflicting file is a *.ipynb, jupyter nbconvert does not recognize the file as a proper JSON. And breaks when executing git mergetool or git diff <*.ipynb file>.
The solution I guess is to avoid executing the filter when running those commands but I'm not sure how to do that.

@pompetardo
Copy link

If i remove the required from .git/config it loads the mergetool but the error still appears so it's not a clean solution.

@miguel9554
Copy link

@konradmb your solution is fantastic. Is there a way to push it so that everyone who clones the repo has the config? As I understand it this is a local solution. Thanks!

@konradmb
Copy link

konradmb commented Jun 13, 2024

@miguel9554

  • I don't think that would be possible as that feature would be deemed as unsafe. For example an attacker could add a filter that runs rm for all files.
  • Yes, this solution works only locally.
  • There's a workaround at https://stackoverflow.com/a/18330114 but it still requires every user to adjust local settings (they discuss the security concerns too).
  • An idea: maybe you can add a pre-merge filter step to GitHub Actions that would run on every pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment