Skip to content

Instantly share code, notes, and snippets.

@tonyfast
Last active February 21, 2023 22:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tonyfast/e17946facd998a931527467d646cc822 to your computer and use it in GitHub Desktop.
Save tonyfast/e17946facd998a931527467d646cc822 to your computer and use it in GitHub Desktop.
extensible notebook schemas

an extensible schema for computational notebooks and cells

currently the notebook schema contains intrinsic notebook schema properties and application specific definitions. it would be good to separate the pure schema information from the application specific schema into something extensible. this document explores core set of definitions that can extended to either a traditional notebook container or a streaming lines format.

a notebook is a collection of cells that can be contained in different ways.

the defs schema is reused to demonstrate different top level schema. its purpose is to make cell schema and cell metadata schema extensible.

"#/$defs/metadata"

  • notebook level metadata needed for either nested or streaming containers.

"#/$defs/cells"

  • allows different combinations of cells to be used.
  • these specific definitions can define their own cell metadata schema

"#/$defs/cell-metadata"

  • constraints for metadata across any cell likes execution time or slide type.

    defs=\

"$defs":
    cells:
    - "$comment": new cells can define their own metadata constraints
      oneOf:
        - "$ref": "code-cell.json"
        - "$ref": "md-cell.json"
        - "$ref": "raw-cell.json"
        - "$ref": "sql-cell.json"
    metadata:
        "$comment": notebook metadata
        type: object
    cell-metadata:
    -   "$comment": it is possible add metadata that applies across all cells
        allOf:
        -  "$comment": slide metadata would most likely be part of the core schema cause it exists
           "$ref": "cell-metadata-slides.json"       
        -  "$comment": cell execution information
           "$ref": "cell-metadata-execution.json"

an extensible schema for the notebook as a container

this schema uses the defs above to recapture the current container format for the notebook.

nb=\
required: [metadata, cells]
properties:
    metadata: 
        "$ref": "#/$defs/metadata"
    cells: 
        items:
            allOf:
                -   "$comment": default metadata properties can be defined 
                    "$ref": "cell.json" 
                - "$ref": "#/$defs/cells"
                - "$ref": "#/$defs/cell-metadata"

an extensible schema for the notebook as a stream

an another representation of the containers is as json lines. in this schema, the first line captures top level notebook information like the metadata with the kernel spec information and notebook format. every line there after is a cell defined in the #/items schema

lines=\
prefixItems:
    -   "$comment": |
            let the first cell contain the notebook level metadata.
            using a readline approach we can peak at the state of a notebook if it has a bunch of information in the metadata
        "$ref": "#/$defs/metadata"
items:        
    allOf:
        -   "$comment": default metadata properties can be defined 
            "$ref": "cell.json" 
        - "$ref": "#/$defs/cells"
        - "$ref": "#/$defs/cell-metadata"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment