Skip to content

Instantly share code, notes, and snippets.

@bollwyvl
Last active December 18, 2022 20:34
Show Gist options
  • Save bollwyvl/18cbfc7f699f831c4881ab43526ce71a to your computer and use it in GitHub Desktop.
Save bollwyvl/18cbfc7f699f831c4881ab43526ce71a to your computer and use it in GitHub Desktop.
Jupyter Rich Input

Rich Input for Jupyter Cells

Elevator Pitch

Extend the Jupyter Notebook Format to offer an optional input field for all Cell types. This is an object which mirrors rich display outputs, providing data and associated metadata, both dictionaries keyed by MIME types.

Jupyter Clients that create and edit Notebooks MAY create this field, but MUST continue to emit the source field as the most-portable source-of-truth.

Jupyter Clients that show Jupyter Notebooks MAY display input information directly, either in text or visual form, but MUST allow for viewing the source in its raw text, and potentially rendered, forms.

Problem

Jupyter Cells suffer from the inflexibility of the text (or list of text) source field, without well-defined ways to describe the intent of the source.

A number of techniques to extend this are:

  • using intentionally underspecified cell metadata
  • inline, and usually non-portable comments
  • inline, and usually non-portable grammar extensions e.g.
    • %magic arguments
    • new markdown constructs

Code Cells

A Code cell, i.e. any cell with the "cell_type": "markdown", provides a string or list of strings for its source. Any knowledge of its type must be inferred from a deeply-nested, but schema-constrained, key in in the parent notebook metadata.

A cell copied from one notebook will only work in another notebook of roughly the same kernel specification, but carries no knowledge of its content.

Further, the types of UI which can be created to generate these cells are limited to the (basically) 1d array of a line-based editor.

Markdown

A Markdown cell, i.e. any cell with the "cell_type": "markdown", also includes a source field as a string or list of strings.

Historically, this has meant whatever marked and mistune support, plus more GFM and math, with the behavior of the jQuery notebook client and nbconvert being the "reference" implementations, thus far not quite matched by JupyterLab.

As these reference implementations are not trivially extensible, a number of downstream tools rework markdown cells in interesting, but ultimately incompatible, ways, leading to the inability of clients to faithfully and accessibly reproduce the intent of the Notebook author.

Raw Cells

A Markdown cell, i.e. any cell with the "cell_type": "raw", also includes a source field as a string or list of strings.

Very underused, these would share the same properties as the above, with less definition of effective use cases.

Use Cases

Visual Code Editing

As a User, I'd like to visually edit a program.

jupyterlab-outsource demonstrates visually building programs in several languages, both in Code Cells and text documents.

This capability is powered by Blockly, which provides a discoverable, internationalized UI for writing programs in a large number of human and machine languages.

Today, that information is stored in a comment in the source cell.

As proposed, the raw XML-based definition of the program would be stored in #/cells/0/input/data/application\/blockly+xml. When changed, the client would update the #/cells/0/source field to reflect the current in-browser transpiled code.

Neighboring metadata in the same MIME type would allow for capturing tool-specific information.

WYSIWYG Editing

As a Jupyter Client User, I'd like to write Markdown cells without learning Markdown syntax.

jupyterlab-outsource demonstrates visually editing Markdown Cells and text documents, with further extensions by jupyterlab-richtext-mode.

This capability is powered by ProseMirror, which natively supports an extensible JSON-compatible schema for defining rich text documents, as well as round-trip compilation to CommonMark.

As proposed, the raw JSON-based definition of the cell document would be stored in #/cells/0/input/data/application\/prosemirror+json. When changed, the client would update the #/cells/0/source field to reflect the current in-browser transpiled markdown, which may include raw HTML elements.

Neighboring metadata in the same MIME type would allow for capturing tool-specific information.

Diagrams

As a Jupyter Client User, I'd like to use visual tools to create diagrams.

ipydrawio demonstrates authoring vector-based, layered, multi-page documents as standalone text/binary documents, as well as XML embedded inside a Notebook.

This capability is powered by drawio.

As proposed, the raw XML-based definition of the diagram would be stored in #/cells/0/input/data\/application\/drawio+xml. When changed, the client would update the #/cells/0/source field to reflect the current in-browser transpiled diagram as an embedded img tag.

Neighboring metadata in the same MIME type would allow for capturing tool-specific information.

Extensible Markdown

As a Jupyter Client User, I'd like to use more advanced features in my markdown.

jupyterlab-markup demonstrates a completely extensible replacement for JupyterLab's markdown renderer (using markdown-it instead of marked), while mostly maintaining support for legacy, but non-core Markdown features in Jupyter such as $-delimited math (and all the problems that come with it).

As such, it introduces new Markdown features entirely incompatible with any other renderer.

As propose, the raw markdown would be stored in #/cells/0/input/data\/text\/jupyterlab+markdown, and the fully-rendered HTML would be injected in the source field.

Neighboring metadata in the same MIME type would allow for capturing the additional plugins that were required to achieve this rendering.

Design

Each of code_cell, markdown_cell and raw_cell would gain the following optional field:

"input": {
  "$ref": "#/definitions/display_data"
}

Further Ideas

A source_input_hash field MAY allow for capturing whether a source it currently up-to-date with its input. However, introducing large hashes of "canonical" (e.g. sorted keys, indented by 2 spaces) JSON strings would create even more churn inside Notebook source, which is likely undesirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment