bollwyvl/rich-input.md

## rich-input.md

      
    Raw
  

              rich-input.md
            
          
    Rich Input for Jupyter Cells

Elevator Pitch

Extend the Jupyter Notebook Format to offer an optional input field for all
Cell types. This is an object which mirrors rich display outputs, providing data and
associated metadata, both dictionaries keyed by MIME types.
Jupyter Clients that create and edit Notebooks MAY create this field, but MUST continue
to emit the source field as the most-portable source-of-truth.
Jupyter Clients that show Jupyter Notebooks MAY display input information directly,
either in text or visual form, but MUST allow for viewing the source in its raw text,
and potentially rendered, forms.
Problem

Jupyter Cells suffer from the inflexibility of the text (or list of text) source
field, without well-defined ways to describe the intent of the source.
A number of techniques to extend this are:

using intentionally underspecified cell metadata
inline, and usually non-portable comments
inline, and usually non-portable grammar extensions e.g.

%magic arguments
new markdown constructs


Code Cells

A Code cell, i.e. any cell with the "cell_type": "markdown", provides a string
or list of strings for its source. Any knowledge of its type must be inferred
from a deeply-nested, but schema-constrained, key in in the parent notebook metadata.
A cell copied from one notebook will only work in another notebook of roughly the
same kernel specification, but carries no knowledge of its content.
Further, the types of UI which can be created to generate these cells are limited
to the (basically) 1d array of a line-based editor.
Markdown

A Markdown cell, i.e. any cell with the "cell_type": "markdown", also includes a
source field as a string or list of strings.
Historically, this has meant whatever marked and mistune support, plus more GFM and math,
with the behavior of the jQuery notebook client and nbconvert being the "reference"
implementations, thus far not quite matched by JupyterLab.
As these reference implementations are not trivially extensible, a number of
downstream tools rework markdown cells in interesting, but ultimately incompatible,
ways, leading to the inability of clients to faithfully and accessibly reproduce
the intent of the Notebook author.
Raw Cells

A Markdown cell, i.e. any cell with the "cell_type": "raw", also includes a
source field as a string or list of strings.
Very underused, these would share the same properties as the above, with less
definition of effective use cases.
Use Cases

Visual Code Editing


As a User, I'd like to visually edit a program.

jupyterlab-outsource demonstrates visually building programs in several languages,
both in Code Cells and text documents.
This capability is powered by Blockly, which provides a discoverable, internationalized
UI for writing programs in a large number of human and machine languages.
Today, that information is stored in a comment in the source cell.

As proposed, the raw XML-based definition of the program would be stored in
#/cells/0/input/data/application\/blockly+xml. When changed, the client
would update the #/cells/0/source field to reflect the current in-browser
transpiled code.
Neighboring metadata in the same MIME type would allow for capturing
tool-specific information.

WYSIWYG Editing


As a Jupyter Client User, I'd like to write Markdown cells without learning Markdown syntax.

jupyterlab-outsource demonstrates visually editing Markdown Cells and text documents,
with further extensions by jupyterlab-richtext-mode.
This capability is powered by ProseMirror, which natively supports an extensible
JSON-compatible schema for defining rich text documents, as well as round-trip
compilation to CommonMark.

As proposed, the raw JSON-based definition of the cell document would be stored in
#/cells/0/input/data/application\/prosemirror+json. When changed, the client
would update the #/cells/0/source field to reflect the current in-browser
transpiled markdown, which may include raw HTML elements.
Neighboring metadata in the same MIME type would allow for capturing
tool-specific information.

Diagrams


As a Jupyter Client User, I'd like to use visual tools to create diagrams.

ipydrawio demonstrates authoring vector-based, layered, multi-page documents
as standalone text/binary documents, as well as XML embedded inside a Notebook.
This capability is powered by drawio.

As proposed, the raw XML-based definition of the diagram would be stored in
#/cells/0/input/data\/application\/drawio+xml. When changed, the client
would update the #/cells/0/source field to reflect the current in-browser
transpiled diagram as an embedded img tag.
Neighboring metadata in the same MIME type would allow for capturing
tool-specific information.

Extensible Markdown


As a Jupyter Client User, I'd like to use more advanced features in my markdown.

jupyterlab-markup demonstrates a completely extensible replacement for JupyterLab's
markdown renderer (using markdown-it instead of marked), while mostly maintaining
support for legacy, but non-core Markdown features in Jupyter such as $-delimited math
(and all the problems that come with it).
As such, it introduces new Markdown features entirely incompatible with any other
renderer.

As propose, the raw markdown would be stored in #/cells/0/input/data\/text\/jupyterlab+markdown,
and the fully-rendered HTML would be injected in the source field.
Neighboring metadata in the same MIME type would allow for capturing
the additional plugins that were required to achieve this rendering.

Design

Each of code_cell, markdown_cell and raw_cell would gain the following
optional field:
"input": {
  "$ref": "#/definitions/display_data"
}
Further Ideas

A source_input_hash field MAY allow for capturing whether a source it currently
up-to-date with its input. However, introducing large hashes of "canonical"
(e.g. sorted keys, indented by 2 spaces) JSON strings would create even more churn
inside Notebook source, which is likely undesirable.