Skip to content

Instantly share code, notes, and snippets.

@bollwyvl
Last active August 29, 2015 14:09
Show Gist options
  • Save bollwyvl/6edda34649ba4b5cbaf9 to your computer and use it in GitHub Desktop.
Save bollwyvl/6edda34649ba4b5cbaf9 to your computer and use it in GitHub Desktop.
IPEP Linked Data as Default
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> _Editors Note: Follow the [instructions](https://github.com/ipython/ipython/wiki/IPEPs:-IPython-Enhancement-Proposals)._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table>\n",
"<tr><td> Status </td><td> Active </td></tr>\n",
"<tr><td> Author </td><td> Nicholas Bollweg &lt;nick.bollweg@gmail.com&gt;</td></tr>\n",
"<tr><td> Created </td><td> November 11, 2014</td></tr>\n",
"<tr><td> Updated </td><td> January 30, 2014</td></tr>\n",
"<tr><td> Discussion </td><td> [link to the issue where the IPEP is being discussed](#tbd) </td></tr>\n",
"<tr><td> Implementation </td><td> [link to the PR](#tbd) </td></tr>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Abstract\n",
"JSON, and JSON-compatible data (e.g. ∅MQ), are already the _de facto_ standard across the Ecosystem (IPython/Jupyter). With [Notebook Format 4][], some components of the broader system will only accept a _de jure_ data representation, irrespective of whether it would have _worked_. The [v4 schema][] represents a step forward in Ecosystem data: parts of the schema _could_ be reused, explicitly stating that two parts of the system share some structure. This leaves the data consumable, but not implicitly understandable.\n",
"\n",
"Additionally, a common output of Ecosystem tools is HTML, and generally seen as the \"end of the line\" for the life of some data. This needn't be the case, as HTML is capable of portably and recoverably storing information about its provenance, assumptions and annotations.\n",
"\n",
"This IPEP suggests, where possible, the reuse of data _meanings_ throughout the Ecosystem, or:\n",
"> Use Linked Data as a representation of first resort.\n",
"\n",
"[Notebook Format 4]: https://github.com/ipython/ipython/wiki/IPEP-17%3A-Notebook-Format-4\n",
"[v4 schema]: https://github.com/ipython/ipython/blob/master/IPython/nbformat/v4/nbformat.v4.schema.json"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Background"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### JSON Islands\n",
"Each of these uses of JSON-compatible data in the ecosystem must basically be individually understood.\n",
"- [nbformat](https://github.com/ipython/ipython/tree/master/IPython/nbformat)\n",
"- [IPEP 20: Informal Structure of Cell Metadata](https://github.com/ipython/ipython/wiki/IPEP-20%3A-Informal-structure-of-cell-metadata)\n",
"- [∅MQ messages](https://github.com/ipython/ipython/tree/master/IPython/kernel/zmq)\n",
"- [RESTful services](https://github.com/ipython/ipython/blob/master/IPython/html/notebookapp.py)\n",
"- [ipython-components `bower.json`](https://github.com/ipython/ipython-components/blob/master/bower.json)\n",
"- [IPEP 27: Contents Service](./IPEP-27%3A-Contents-Service)\n",
"- [kernelspec](https://github.com/ipython/ipython/blob/fd78fe8c64f17296b2bfe6b778450f238010473d/IPython/html/services/kernelspecs/tests/test_kernelspecs_api.py)\n",
"- [WidgetModel](https://github.com/ipython/ipython/blob/master/IPython/html/static/widgets/js/widget.js#L11)\n",
"\n",
"The list will continue to grow as more things are moved from code to data: whether it is static widget data-at-rest, additional services related to multi-user content, etc."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Motivation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Discoverability\n",
"As publicly-available notebooks contain an increasing amount of important data, code and findings, it follows that finding, organizing, and referencing them will become increasingly meaningful. This task can be significantly improved by leveraging existing linked data concepts, specifically those provided by widely-adopted vocabularies like [foaf](http://www.foaf-project.org/) and [schema.org](http://schema.org).\n",
"\n",
"Identifying those core Ecosystem concepts that already fit within such categories creates immediate value, whether in the data-at-rest (.ipynb), on the wire (message format) or in transformed formats (such as nbviewer output)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cross-kernel implementation\n",
"As more and better features for advanced user interfaces are implemented in different kernels, the amount of reproduced code will grow. Rich meaning at the data level will help give structure to cross-language constructs.\n",
"\n",
"For example, adopting a rich meaning for fields within `WidgetModel` subclasses would allow for better reuse of domain concepts. Consider a `date` field:\n",
"- at its simplest, it could be simply said to have a specific format, `xsd:date`, i.e. ISO-8601\n",
"- a stronger concept could even specify is a [`schema:startDate`](http://schema.org/startDate)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### IPEP 17: Not far enough?\n",
"As noted above, while Notebook Format 4 represents a significant move forward, some remaining issues arise in the content contained therein: `.ipynb` files do not contain a reference to their schema, such as:\n",
" - explicitly, e.g. a `$schema` key\n",
" - implictly, e.g. served with an [HTTP header](http://json-schema.org/latest/json-schema-core.html#anchor33) (e.g. `Content-Type` or `Link`)\n",
"\n",
"As such, if found in the wild, a manual step of dereferencing `nbformat` to the source repository would be required to find documentation of its content. Community users of `master` have already discovered this."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Linked Data Formats\n",
"What is a notebook? Depending on whom you ask, it is at least:\n",
"- to a desktop user, a .ipynb file on their hard drive\n",
"- to contents manager, a JSON-compatible object\n",
"- to nbconvert, a static HTML document\n",
"- to nbviewer, a URL a user can click on and share\n",
"\n",
"Of these formats, some lend themselves to being enhanced with linked data more than others. Of particular note are:\n",
"- HTML: [RDFa](http://www.w3.org/TR/rdfa-syntax/) is a W3C standard for inline linked data annotation of an HTML tree\n",
"- JSON: [JSON-LD](http://www.w3.org/TR/json-ld/) is a W3C standard for inline or out-of-band annotation of a JSON document\n",
"\n",
"Adoption of these formats represents the best way towards making the content contained in the body of knowledge created by the Community more broadly discoverable. Adoption can be gradual, with each successive step providing additional features and content."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Linked Data Concepts\n",
"Meaning, in this prposal, means that with the data alone or a suitable reference to the data, it can be unambiguously: `nbformat: 4` is not `execution_count: 4`. The core elements of meaning addressed in this proposal are:\n",
"- reusable [context](#Context)\n",
"- object [identity](#Identity)\n",
"- object [type](#Type)\n",
"- text [natural language](#Natural Language) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Context\n",
"The proposed means of reusing meaning is [JSON-LD contexts](http://www.w3.org/TR/json-ld/#the-context). Contexts provide a modular, opt-in, potentially-out-of-band means for capturing the intent of a piece of data.\n",
"\n",
"To claim that some JSON can be interpreted with _meaning_:\n",
"- a JSON document i.e. `#/` can\n",
" - include a `@context`, i.e. `#/@context`\n",
" - be served or embedded with a `Link` header:\n",
"\n",
"```http\n",
"Link: <http://ipython.org/contexts/notebook.jsonld>; rel=\"http://www.w3.org/ns/json-ld#context\"; type=\"application/ld+json\"\n",
"```\n",
"\n",
"- any JSON object, i.e. `#/metadata` can\n",
" - include a `@context`, i.e. `#/metadata/@context`\n",
" - this will override any parent `@context`, however it was defined"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Identity\n",
"Is some piece of data addressable in an unambiguous way? Concretely, this means that a JSON object, either through its position in a document, or through use of a keyword, has an [Universal Resource Identifier][uri], or URI. While URIs share many characteristics with their more functional twins, URLs, URIs are not neccessarily dereferenceable: this is actually beneficial, as they don't need to be migrated, hosted or otherwise maintained, just agreed upon.\n",
"\n",
"Examples of things in the Ecosystem that could benefit from having explicit identity:\n",
"- Notebook Format 4\n",
" - canonical URI: `http://ipython.org/ns/nbformat/v4`\n",
" - compact URI: `nbf:4`\n",
"\n",
"[uri]: https://tools.ietf.org/html/rfc3986"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Type\n",
"Does some piece of data share properties with other pieces of data? Type is the convenient bundling of possible properties that help a user make sense of data they find.\n",
"\n",
"Examples of things in the Ecosystem that could benefit from having explicit type:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Natural Language\n",
"While the Community is diverse and multi-lingual, the software, development process and documentation that represents the Ecosystem remains anglophilic and pythonic.\n",
"\n",
"Users of the notebook, however, are already using it to publish content in many languages, both machine and natural. While kernel agnosticism is one of the current challenges, natural language agnosticism will eventually become a feature, as publishing can occur in all manner of languages. Instead of inventing new syntax for capturing this, we can adopt a single, standards-based representation.\n",
"\n",
"[JSON-LD provides](http://www.w3.org/TR/json-ld/#string-internationalization) a means for using [consistent codes](http://tools.ietf.org/html/bcp47) for:\n",
"- specifying the default language of a document\n",
" ```application/ld+json\n",
" {\n",
" \"@context\": {\n",
" \"@language\": \"en\"\n",
" },\n",
" \"cells\": [\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"source\": [\n",
" \"Some English\\n\"\n",
" ]\n",
" },\n",
" ]\n",
" ```\n",
"- much more invasively, storing multiple natural language representations in the same key\n",
" ```application/ld+json\n",
" {\n",
" ...\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"source\": [\n",
" {\n",
" \"@language\": \"en\",\n",
" \"@value\": [\n",
" \"Some English\\n\"\n",
" ]\n",
" },\n",
" {\n",
" \"@language\": \"de\",\n",
" \"@value\": [\n",
" \"Etwas Deutsch\\n\"\n",
" ]\n",
" }\n",
" ]\n",
" },\n",
" ```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Implementation Challenges & Opportunities"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Upstream Dependency Support\n",
"#### Highlighting\n",
"- [CodeMirror](http://codemirror.net/mode/javascript/json-ld.html) has support as of 3.x at some point\n",
"- [Pygments](http://pygments.org/docs/lexers/#pygments.lexers.data.JsonLdLexer) has support as of 2.01"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### New Support Libraries\n",
"While it is unneccessary to de-reference a JSON-LD document into a more graph-like form, at times it will be useful. Having one of the known implementations in the Ecosystem languages will be key to useful features being developed that make use of Linked Data.\n",
"\n",
"#### Interpretation\n",
"- [PyLD](https://github.com/digitalbazaar/pyld)\n",
"- [jsonld.js](https://github.com/digitalbazaar/jsonld.js)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Roadmap"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A lightweight `@context` for Notebook Format 4\n",
"As part of [this discussion](http://article.gmane.org/gmane.comp.python.ipython.devel/14153), the following lightweight context was proposed:\n",
"\n",
"```application/ld+json\n",
"{\n",
" \"@context\": {\n",
" \"@vocab\": \"http://ipython.org/nbformat/v4/\",\n",
" \"nb4\": \"http://ipython.org/nbformat/v4/\",\n",
" \"xsd\": \"http://www.w3.org/2001/XMLSchema#\",\n",
" \"foaf\": \"http://xmlns.com/foaf/0.1/\", \n",
" \"language\": {\"@type\": \"@id\"},\n",
" \"codemirror_mode\": {\"@type\": \"@id\"},\n",
" \"cell_type\": {\"@id\": \"@type\"},\n",
" \"output_type\": {\"@id\": \"@type\"},\n",
" \"cells\": {\"@container\": \"@list\"},\n",
" \"source\": {\"@container\": \"@list\"},\n",
" \"outputs\": {\"@container\": \"@list\"},\n",
" \"text\": {\"@container\": \"@list\"},\n",
" \"traceback\": {\"@container\": \"@list\"},\n",
" \"tags\": {\"@container\": \"@set\"},\n",
" \"collapsed\": {\"@type\": \"xsd:boolean\"},\n",
" \"execution_count\": {\"@type\": \"xsd:int\"},\n",
" \"nbformat_minor\": {\"@type\": \"xsd:int\"},\n",
" \"nbformat\": {\"@type\": \"xsd:int\"},\n",
" \"signature\": {\"@type\": \"foaf:sha1\"},\n",
" \"image/svg+xml\": {\"@container\": \"@list\"},\n",
" \"image/png\": {\"@container\": \"@list\"},\n",
" \"text/html\": {\"@container\": \"@list\"},\n",
" \"text/plain\": {\"@container\": \"@list\"},\n",
" \"application/javascript\": {\"@container\": \"@list\"}\n",
" }\n",
"}\n",
"```\n",
"Serving this as an out-of-band context for the raw notebook REST service and `nbviewer` download would immediately create some value."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a `JSONLDExporter` for `nbcovert` \n",
"Being able to export a self-contained and -describing, machine-readable document from a notebook would be a good step in enabling downstream use of the data stored in notebooks.\n",
"\n",
"Beyond the baseline of what could be captured with a lightweight notebook `@context`, a dedicated exporter could provide additional advantages:\n",
"- embed the full `@context`, making the meaning in the documents unambiguous and portable\n",
"- extract links in Markdown cells"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add `--include-rdfa` to HTMLExporter\n",
"Either as a separate output of the notebook export process, HTML can contain in-line Linked Data attributes, using the RDFa notation. Thus, when a cell or image is exported, metadata could be provided to make the document content more understandable to external agents."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Expose Linked Data in `nbviewer`\n",
"As `nbconvert` drives what [nbviewer](http://nbviewer.ipython.org/) can display, a natural step would be to advertise and provide notebooks from across the web in a linked data format:\n",
"```\n",
"http://nbviewer.ipython.org/as/jsonld/github/ipython/ipython/blob/2.x/examples/Index.ipynb\n",
"http://nbviewer.ipython.org/as/rdfa/github/ipython/ipython/blob/2.x/examples/Index.ipynb\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### FUTURE: Treat notebook `metadata` as a compacted JSON-LD document\n",
"In the canonical Notebook front-end (the JavaScript UI), cell and notebook `metadata` is still the \"wild west\". \"Tamed\" with an explicit context, and treating the output of metadata as a _compacted_ JSON-LD document, `metadata` could become the \n",
"\n",
"#### Example\n",
"Slideshow metadata:\n",
"\n",
"```javascript\n",
"{\n",
" \"@context\": {\n",
" \"slides\": \"http://ipython.org/formats#slides\"\n",
" },\n",
" \"cells\": [\n",
" {\n",
" \"metadata\": {\n",
" \"slides:type\": \"slides:slide\"\n",
" }\n",
" }\n",
" ]\n",
"}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Related Work\n",
"- [Research Object Bundle](http://www.researchobject.org/)\n",
" > _This specification defines RO Bundle, a ZIP-based file format that\n",
" bundles resources which when aggregated form an identifiable\n",
" conceptual work; say a collection of datasets resulting from a\n",
" scientific experiment, or a gathering of logs and outputs from a\n",
" particular command line execution._"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment