Skip to content

Instantly share code, notes, and snippets.

@michael
Created May 31, 2013 09:44
Show Gist options
  • Save michael/5683956 to your computer and use it in GitHub Desktop.
Save michael/5683956 to your computer and use it in GitHub Desktop.
Title: eLife Lens - A novel way of seeing content

Abstract

Introduction

Working with digital documents used to be hard because for the most part, they come in presentation-centric formats, optimized for print and having the same display across multiple devices. Ultimately, the display of these documents had been preserved to ultimately allow the user to print the exact same document across any device. Content today, however, is no longer being printed out readily; instead, it is read on a variety of platforms spanning computers and mobile devices. The limitations presented by different screen sizes, the lack of tactile feedback that comes from flipping between pages and inability to purely focus on the author’s arguments are problems present in all disciplines.

The evolution of the web browser has created a unified platform for viewing content. Instead of binding the content to a presentation focused format, we can view the content as data. Thereby making the content readily accessible by utilizing a defined data structure. This data structure can then be processed in similar ways to databases -- allowing users to query the content and build new tools.

We have decided to focus our initial efforts on applying the data-centric representation of content to the scientific literature. The Open Access scientific community has standardized much of their content into a formally annotated XML format which make it easier to gain access to a large library of articles. The content of scientific articles is a combination of text, figures, tables, videos and references which are used to form the author’s argument. Scientific arguments are also rarely linear in nature, making it difficult to follow an argument when all of the content is not readily viewable on a digital device. The large amount of available data and non-linear format of the articles, provided us with the ideal start to test the flexibility of our data-centric approach. With the release of Lens, we would like to not only promote the data-centric representation of scientific content, but that of any content, which would be optimized for viewing on web clients.

The Lens Document Format

Lens can display any document that conforms to a simple JSON representation. A JSON representation is flexible and can be adapted to any type of content. The initial JSON object (Figure 1A) provides the framework for organizing all of the content in an article. The id and properties keys provide global access if many JSON articles are to be queried for an initial round of filtering. The nodes key contains all of the content and the view key defines indices about what order the nodes will be displayed (Figure 1B). The nodes are then linked together via annotations. The annotations are defined by an explicit reference to a target content node that is contained in its source content code (Figure 1C).

Figure 1A

The creation of an article’s JSON depends on a converter script that runs through the XML tags using the sax-js node.js module. The XML tags define the types of nodes that are supported. Table 1 outlines the conversions. Discuss the different types of nodes, font styling and XML conversion

The Presentation of Lens

When designing Lens, we worked to provide a flexible experience that supports a variety of use cases. On the far left the document map identifies the readers position in the context of the whole article. It also outlines each paragraph in the article. The left panel includes all of the textual content of the article. The right panel includes the resources. The resources are defined as the table of contents, figures, tables, videos, supplemental data, references and information about the article. By splitting apart the content into individual panels, we enable the reader to focus on multiple content bits at the same time.

When reading scientific articles, it is rare for a reader to read straight through the text and figures in order. Instead, they may look to get a quick overview of the paper by looking at the figures first. By splitting the figures out from the text, the reader can transition between the two independently of the other. This allows the reader to view exactly the content they want to focus on at that moment at the same time.

Two the viewing modes that are implicitly supported are text-centric and resource-centric viewing. Text-centric viewing creates a microscale reading experience by limiting the viewable content to a singular content node. When a figure or publication label is selected within a text node, then only the figures or publications that are referenced within that paragraph will be displayed in their appropriate resource sections (Figure 3). Resource-centric viewing on the other hand provides a global overview of the paper. Selection of any figure or publication will color the paragraphs in the document map that include a reference to the selected resource (Figure 4).

The presentation of the article information has also been redesigned to organize the information in a uniform way. Each publisher will have information like the author’s impact statement, article keywords, major dataset links, etc. placed in different positions of their HTML or PDF versions of the articles. The aim was to distill all of article information into organized subsections that could be displayed together on individual cards. The information cards create a uniform viewing experience where it is easy to quickly find what the reader is looking for.

Working with the Lens Document Format

It's fairly easy to use Lens with your own content. This sections explains the basic concepts behind our format and should help you get started. We'll add more documentation as we go. Please also

{
  "id": "introducing_lens",
  "nodes": {},
  "properties": {},
  "views": {
    "content": [],
    "figures": [],
    "publications": []
  }
}

First off all content is stored in content nodes, a flat unordered data-structure where each node is identified by a unique id. Every node is assigned a type and has a number of properties.

Properties

You can define custom properties on your document and store them within the properties object. Let's set a title for our new document.

   ...,
   "properties": {"title": "Lens"},
   ...

Content Nodes

Let's start by adding some text to our document by specifying a text node. It's just a simple piece of JSON.

{
  "type": "text",
  "id": "text:intro",
  "content": "Lens is an alternative way to view a research article (Figure 1).",
}

As mentioned before every content element needs to have a unique identifier, as well as a type which describes what kind of content is stored. This tells Lens how to render that piece of information.

In order to structure our document better we also have support for headings. They look like so:

{
  "type": "heading",
  "id": "heading:intro",
  "level": 1,
  "content": "Introduction",
}

That was easy. We've defined two content nodes, which go into the nodes object, using their id as a key. One last thing to do is updating the views object to reflect the order of our newly added content elements.

At this point, our document would will like so:

{
  "id": "hello_world",
  "nodes": {
    "text:intro": {...}
    "heading:intro": {...}
  },
  "views": {
    "content": ["heading:intro", "text:intro"],
    "figures": [],
    "publications": []
  }
}

Annotations

Text nodes only store plaintext, which is by design. In order to cleanly separate presentation from the content, we keep all annotations in separate objects. However, annotations are just regular nodes. Let's add an emphasis annotation for "Lens".

{
  "id": "annotation:1",
  "type": "emphasis",
  "source": "text:abstract",
  "pos": [0, 12]
}

Like with any node, we assign a unique id and a type. Annotation nodes also take a source and pos property, stating which text node is referenced (text:abstract) and which characters the annotation belongs to ([0, 12] 0 being the startpos and 12 being the character offset).

There are some more basic annotation type including strong, link, inline-formula etc. Over the next weeks we'll add a complete documentation covering all node types.

Figures

However, now let's add a figure now and link it with some text.

{
  "type": "image",
  "id": "image:fig1",
  "label": "Figure 1.",
  "url": "http://Lens.elifesciences.org/Lens.png",
  "large_url": "http://Lens.elifesciences.org/Lens_large.png",
  "doi": "http://dx.doi.org/10.7554/eLife.00336.003",
  "caption": "caption:53"
}

As you may have noticed, figures also link to a captions node, which adds some labels and description to figures.

    "caption:53": {
      "type": "caption",
      "id": "caption:53",
      "title": "Lens",
      "content": "",
      "source": "image:fig1"
    }

In order to have your newly added figure show up in Lens's Figures tab, you have to create an entry in the figures view.

"views": {
  "figures": ["image:fig1"]
}

You can easily link it with your text by providing a figure_reference annotation that refers to a paricular text node and character range.

"annotation:2": {
  "type": "figure_reference",
  "id": "annotation:2",
  "source": "text:intro",
  "target": "image:fig1",
  "key": "content",
  "content": "Figure 1A,B",
  "pos": [32, 8]
},

Publications

It works the same way for publications, that you want to reference in your research article. A publication node looks like so:

{
  "type": "publication",
  "id": "publication:bib38",
  "authors": [
    {
      "given-names": "S",
      "last-name": "Piccirillo"
    },
    ...
  ],
  "title": "The Rim101p/PacC pathway and alkaline pH regulate pattern formation in yeast colonies",
  "year": "2010",
  "source": "Genetics",
  "volume": "184",
  "fpage": "707",
  "lpage": "16"
}

Like with figures, they have to be added to publications view to be shown in Lens. And you can link them with the text by specifying a publication_reference annotation.

{
  "type": "publication_reference",
  "id": "annotation:23",
  "source": "text:22",
  "target": "publication:bib34",
  "key": "content",
  "content": "Murray, 2003",
  "pos": [85, 12]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment