Skip to content

Instantly share code, notes, and snippets.

@kendallroth
Last active July 20, 2022 15:56
Show Gist options
  • Save kendallroth/c06f70d4bef14e6a9af6c72cfde41ea0 to your computer and use it in GitHub Desktop.
Save kendallroth/c06f70d4bef14e6a9af6c72cfde41ea0 to your computer and use it in GitHub Desktop.
Reduce Jupyter Notebook Git conflicts

Jupyter Git Conflicts

Jupyter Notebooks have a tendency to cause Git conflicts when working collaboratively on a notebook. This is primarily due to cell metadata that really doesn't matter, and can be removed with a pre-save configuration hook. This hook will be run across all Jupyter Notebooks; however, it utilizes a custom opt-in metadata configuration option (reduce_git_conflicts). If the opt-ine is not specified, or set to false, the Notebook will save as normal.

  1. Add pre-save hook to Jupyter Notebook config
  2. Enable reducing Git conflicts in Notebook metadata

NOTE: This approach does modify the notebook save file itself, which could lead to issues if there are fields that must remain in the file but lead to conflicts (and would otherwise be removed).

NOTE: This approach was developed with Jupyter Notebook in mind. While Jupyter Lab is quite similar, there are several differences (generating config, editing metadata).

Jupyter Config

Either create or update the Jupyter Notebook config (typically stored under ~/.jupyter/jupyter_notebook_config).

Source: jupyter-notebook.readthedocs.io/en/stable/extending/savehooks.html

# Create config (if one does not exist already)
jupyter notebook --generate-config
import os
from subprocess import check_call

c = get_config()

def scrub_output_pre_save(model, **kwargs):
    """Scrub potential conflicting output before saving notebooks."""
    if model["type"] != "notebook":
        return
    if model["content"]["nbformat"] != 4:
        return

    try:
        """Only scrub notebooks that are specifically enabled via custom Notebook metadata."""
        if model["content"]["metadata"]["reduce_git_conflicts"] != True:
            return
    except:
        print("Git conflict scrubbing not enabled for notebook")
        return
    
    print("Scrubbing potential Git conflicts from '{0}'".format(kwargs["path"]))

    for cell in model["content"]["cells"]:
        """Only modify 'code' cells (other types do not have conflicting metadata)."""
        if cell["cell_type"] != "code":
            continue

        """Cell execution counts (both cell and output) create unnecessary Git conflicts."""
        if (cell["execution_count"] is not None):
            cell["execution_count"] = None

        try:
            for output in cell["outputs"]:
                """Adding empty execution count to all output can cause validation errors!"""
                if (output["execution_count"] is not None):
                    output["execution_count"] = None
        except:
            pass

## Python callable or importstring thereof
#  See also: ContentsManager.pre_save_hook
c.FileContentsManager.pre_save_hook = scrub_output_pre_save

Notebook Metadata

Add the following key/value pair to the metadata section within the Jupyter Notebook file. This can be done manually or through the Edit > Edit Notebook Metadata menu. Once the notebook has been saved, the Jupyter pre-save script will pick up the file for reducing Git conflicts.

{
  "metadata": {
    ...
    "reduce_git_conflicts": true
  }
}
Per-project Config

Jupyter Notebook allows specifying a custom config file with jupyter notebook --config=./path. This can be used to check in the above config file and remove dependency on user-level config file. Additionally, since the config is specific to a project, the metadata check can be entirely removed if wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment