Skip to content

Instantly share code, notes, and snippets.

@tomcrane

tomcrane/WDL-IxIF.md

Last active Jul 31, 2020
Embed
What would you like to do?
Wellcome Library IxIF "Interim" implementation

We want to move the Wellcome Library away from the Wellcome Player and onto the IIIF 2.0 Universal Viewer (UV).

This allows us to move all the Wellcome Library's image API endpoints to the protoype DLCS (Digital Library Cloud Services) that we have started building.

We have a problem. We have video, audio and born-digital content, besides image sequence content. We don't want to maintain the Player and the UV together. This non-image content is a tiny fraction of the total, but an important one.

Other institutions share this problem, and everyone agrees that IIIF will need to extend to handle non-image-sequence resources - "IxIF". We want to inherit all that we can from IIIF - the JSON-LD, the Open Annotation model, the manifest wrapper and general approach to metadata ("presentation not semantics"). Shared Canvas may be appropriate for some media but not others.

  • Images are canvas-based media - they occupy a region of a plane. Annotations are on regions of the plane.
  • Audio is a time-based medium - its components have durations in time. Annotations are on intervals in time.
  • Video is both of these combined; annotations would be on regions of a plane for a particular interval in time.
  • Born Digital files are something else again, they don't represent an original something that occupies time or space.

This is a hard problem, with a potential explosion of use cases. We need to meet the needs of sound and video archivists and scholars. We know that plenty of people have these use cases, and we need to collect them. It will be a long time before an IIIF equivalent for other media reaches the same level of utility as IIIF does now. We have some questions to think about, for example do we have an equivalent of the Image API for time based media? That implies resampling, transcoding and other CPU intensive services that may be required.

So what does the Wellcome Library do in the short term? Maintaining two viewers is a blocker to our current work; we need to play audio, video and born-digital content, in the UV, from manifests that share a common approach.

Interim IxIF

In practice, the Wellcome Library's 2015 use cases for audio, video and born digital are not directly comparable to deep zoom image sequence requirements. They are also really rather simple, compared to the potential of IIIF.

  • We have two formats (mp4, webm) available for dissemination for all the video, and one format (mp3) for audio.
  • Born Digital documents are all PDFs.
  • We have no annotation requirements on non-image material, with one exception - providing a PDF transcript of a video or audio file. This is not a "painting" motivation annotation and is much more simply expressed.
  • We may have some other born-digital formats later. All we need to do is describe them and make them available for download.

Our short term solution is to ignore all the great things that IxIF could become, and provide the bare minimum required to meet these requirements (i.e., match the functionality of the current Wellcome Player) using as much IIIF as possible. We deliberately want to avoid trying to solve these hard problems right now, because we don't have the use cases. We just want to play some video!

What follows should not be interpreted as any sort of suggestion of a standard, there are no modifications to the IIIF data model - just some additional vocabulary that we will use to make the UV understand a manifest that asserts non-image-sequence resources. Other viewers will just ignore that content.

Principles

sc:Sequence are always sequence of sc:Canvas

If we introduce other types, can sc:Sequence be their parent? No, probably not, because sc:Sequence is interpreted as "coherent sequence of images" not just sequence of any old stuff. We might break other viewers if we add non-IIIF resources to a sc:Sequence.

Provide something for other viewers to see

If the manifest is the medium of interchange and all the Wellcome Library's digitised resources have a manifest, then it is likely that people might try to open those manifests in other viewers. if the manifest doesn't describe any image sequences, it will appear empty in other viewer. So provide something for them to look at, with a hint to the UV that if it knows what all the other stuff means, then don't display this particular resource:

"sequences": [
  {
    "@id": "http://wellcomelibrary.org/iiif/ixif-message/sequence/seq",
    "@type": "sc:Sequence",
    "label": "Unsupported extension. This manifest is being used as a wrapper for non-IIIF content (e.g., audio, video) and is unfortunately incompatible with IIIF viewers.",
    "compatibilityHint": "displayIfContentUnsupported",
    "canvases": [
      {
        //... a placeholder image for other viewers to look at...

Use an additional @context to provide JSON-friendly names

...but don't include that context in a manifest unless it is actually using non-IIIF terms. a JSON-LD document can have more than one context defined, for example to define "compatibilityHint".

  "@context": [
    "http://iiif.io/api/presentation/2/context.json",
    "http://wellcomelibrary.org/ixif/0/context.json"
  ],

However, there's a risk that this might confuse matters, so only include it if it's going to be used.

Introduce "mediaSequences" to contain the new material

ixif:MediaSequence is a new type.

In theory a mediaSequence could contain items of type sc:Canvas - but other viewers wouldn't find them, so we won't do that. Otherwise the resource is analagous to sc:Sequence. Where it differs is what it contains. Initially I tried to make the model feel like the Presentation API, with an equivalent of "canvases". But sc:Canvas without Shared Canvas does not make sense, and we do not need to annotate our non-image resources the same way as we do a canvas. At some point in the future we will need to annotate video with rectangles and time intervals, but that's not what we need right now. So rather than a layer of indirection our mediaSequences contain the media elements directly.

Here's a video:

"mediaSequences": [
  {
    "@id": "http://wellcomelibrary.org/iiif/b24744669/xsequence/s0",
    "@type": "ixif:MediaSequence",
    "label": "XSequence 0",
    "elements": [
      {
        "@id": "http://wellcomelibrary.org/iiif/b24744669/video-resource-0",
        "@type": "dctypes:MovingImage",
        "label": "The antibiotics and terramycin.",
        "metadata": [
          {
            "label": "length",
            "value": "19mn 40s"
          }
        ],
        "thumbnail": "http://wellcomelibrary.org/posterimages/0055-0000-7653-0000-0-0000-0000-0.jpg",
        "rendering": [
          {
            "format": "video/mp4",
            "@id": "https://s3-eu-west-1.amazonaws.com/wdl-video-open/mp4/309ab4d4-e162-4e79-9e3e-565412c69523.mp4"
          },
          {
            "format": "video/webm",
            "@id": "https://s3-eu-west-1.amazonaws.com/wdl-video-open/webm/309ab4d4-e162-4e79-9e3e-565412c69523.webm"
          }
        ]
      }
    ]
  }
],

The full manifest can be seen here: http://wellcomelibrary.org/iiif/b24744669/manifest

Much has been borrowed from IIIF here, including the 2.1 "rendering" (which is not defined in the referenced context, but we'll gloss over that for now). A viewing application can decide which of the two renderings it should play - a real concern for web video, it may be one or the other in some browsers with no choice in the matter.

By analogy, an audio resource looks like this:

"mediaSequences": [
{
  "@id": "http://wellcomelibrary.org/iiif/b17307922/xsequence/s0",
  "@type": "ixif:MediaSequence",
  "label": "XSequence 0",
  "elements": [
    {
      "@id": "http://wellcomelibrary.org/iiif/b17307922/audio/master.mp3",
      "@type": "dctypes:Sound",
      "label": "Florence Nightingale :",
      "metadata": [
        {
          "label": "length",
          "value": "74.057144"
        }
      ],
      "thumbnail": "http://wellcomelibrary.org/posterimages/0056-0000-4402-0102-0-0000-0000-0.jpg",
      "rendering": {
        "format": "audio/mp3",
        "@id": "http://wellcomelibrary.org/media/b17307922/0/0128dccf-e2b8-4b0d-b41a-2d9edc6952f5.mp3"
      }
    }
  ]
}
],

Or alternatively, as there is no choice of rendering here:

"mediaSequences": [
{
  "@id": "http://wellcomelibrary.org/iiif/b17307922/xsequence/s0",
  "@type": "ixif:MediaSequence",
  "label": "XSequence 0",
  "elements": [
    {
      "@id": "http://wellcomelibrary.org/media/b17307922/0/0128dccf-e2b8-4b0d-b41a-2d9edc6952f5.mp3",
      "@type": "dctypes:Sound",
      "format": "audio/mp3",
      "label": "Florence Nightingale :",
      "metadata": [
        {
          "label": "length",
          "value": "74.057144"
        }
      ],
      "thumbnail": "http://wellcomelibrary.org/posterimages/0056-0000-4402-0102-0-0000-0000-0.jpg"
    }
  ]
}
],

the full manifest is here: http://wellcomelibrary.org/iiif/b17307922/manifest

If renderings are provided, then they must each be accompanied by a "format" so that a viewer can decide which rendering to pick. If no renderings are provided, a format should be provided on the element itself, as in the example above, and the viewer should use the @id of the element. In all cases the viewer knows the format of any media referenced, so can decide whether or not it has the ability to render them. If it doesn't, and the manifest also has a sc:Sequence of canvases with the compatibilityHint "displayIfContentUnsupported", it should show that special "fallback" sequence. If it can play at least one format per element, it should ignore that sequence. Eventually such fallbacks will not be needed, but for now I think we have to handle gracefully the scenario where someone loads a Wellcome video manifest into Mirador (for example).

Here's a born-digital item, in this case a PDF:

"mediaSequences": [
{
  "@id": "http://wellcomelibrary.org/iiif/b17502792/xsequence/s0",
  "@type": "ixif:MediaSequence",
  "label": "XSequence 0",
  "elements": [
    {
      "@id": "/media/b17502792/0/caf18956-8f79-4fe6-8988-af329b036416.pdf",
      "@type": "foaf:Document",
      "format": "application/pdf",
      "label": "Science and the public",
      "metadata": [
        {
          "label": "pages",
          "value": "137"
        }
      ],
      "thumbnail": "http://wellcomelibrary.org/pdfthumbs/b17502792/0/caf18956-8f79-4fe6-8988-af329b036416.jpg"
    }
  ]
}

Again, here's the manifest: http://wellcomelibrary.org/iiif/b17502792/manifest

This same element @type can be used for any document. The combination of "@type" and "format" (in this case application/pdf) should let a viewer know whether it can render it. The "legacy" Wellcome Player can render a PDF - see http://wellcomelibrary.org/player/b17502792 - but it can't render a Word document or any kind of spreadsheet, leading to:

Unrecognised types and formats can be presented as simple downloads

In other words, if we had:

"@type": "foaf:Document",
"format": "application/vnd.ms-excel"

...then our viewer would simply make the resource (an Excel spreadsheet) available to download.

Transcription

One additional requirement we have to duplicate current functionality from the Wellcome Player is the ability to present a PDF transcription of a video. In this case the manifest has a PDF in it, but the PDF is not the item the manifest is about - it's a transcription of the thing the manifest is about. Whereas the Wellcome Player renders a PDF directly if it is a born-digital catalogue item, it just provides a download link to it if it is a transcript of the catalogue item. We need to provide enough information for the UV to make this distinction (rather than have a manifest that happens to have a video and a PDF in it).

Here I may have taken a few liberties with reuse of existing IIIF vocabulary. I have also imported a "transcribing" motivation:

"mediaSequences": [
  {
    "@id": "http://wellcomelibrary.org/iiif/b17236381/xsequence/s0",
    "@type": "ixif:MediaSequence",
    "label": "XSequence 0",
    "elements": [
      {
        "@id": "http://wellcomelibrary.org/iiif/b17236381/element/e0",
        "@type": "dctypes:MovingImage",
        "label": "Clinical oncology.",
        "metadata": [
          {
            "label": "length",
            "value": "58mn 6s"
          }
        ],
        "thumbnail": "http://wellcomelibrary.org/posterimages/0055-0000-3865-0000-0-0000-0000-0.jpg",
        "rendering": [
          {
            "format": "video/mp4",
            "@id": "/media/b17236381/0/1fb4d426-6b8d-41fc-8537-b1f02efa17a2.mp4"
          },
          {
            "format": "video/webm",
            "@id": "/media/b17236381/0/1fb4d426-6b8d-41fc-8537-b1f02efa17a2.webm"
          }
        ],
        "resources": [
          {
            "@id": "http://wellcomelibrary.org/iiif/b17236381/transcript/cab919d2-4cb6-4551-b87d-d19ad30f27c6",
            "@type": "oa:Annotation",
            "motivation": "oad:transcribing",
            "resource": {
              "@id": "/media/b17236381/1/cab919d2-4cb6-4551-b87d-d19ad30f27c6.pdf",
              "@type": "foaf:Document",
              "format": "application/pdf",
              "label": "Clinical oncology",
              "metadata": [
                {
                  "label": "pages",
                  "value": "29"
                }
              ],
              "thumbnail": "http://wellcomelibrary.org/pdfthumbs/b17236381/1/cab919d2-4cb6-4551-b87d-d19ad30f27c6.jpg"
            },
            "on": "http://wellcomelibrary.org/iiif/b17236381/element/e0"
          }
        ]
      }
    ]
  }

full manifest: http://wellcomelibrary.org/iiif/b17236381/manifest

The body of the annotation is an ixif:Element exactly the same as the PDF in the preceding example, which should make it easier for viewer implementation.

Possible points of contention are the reuse of "resources", which in the IIIF context aliases "sc:hasAnnotations" and is intended for an annotation list external to the manifest; the use of oad:transcribing, which might not be the best motivation; the use of foaf:Document to indicate something "documenty" and probably some others. For the Wellcome use cases, the manifests are small and it felt appropriate to include transcription document resources in the manifest as a special case, just as the presentation API includes image annotations as a special case.

Perhaps "resources" could be replaced by an additional new term, "transcriptions", analagous to oa:hasImageAnnotations in the Presentation API. There may be other "download" style resources associated with media, so we left it as "resources".

Caveats, next steps

We are now going to try to port the audio, video and born-digital functionality from the Wellcome Player to the UV, using this model. No doubt the model will need further tweaks in response to implementation. The functionality afforded by this model meets our immediate need (playing video, audio and PDFs) without presupposing what "proper" IxIF might look like.

There's a danger that our interim implementation will be seen as an attempt to solve the hard problem of IxIF in general. It isn't. It should be seen as a practical local solution to an immediate problem that takes advantage of IIIF's linked data model and JSON-LD serialisation.

http://iiif.io/api/presentation/2.0/index.html#linked-data-context-and-extensions

After we have had a go at implementation, I will write a more formal description of the model we are using.

Manifests referenced

Context document

{
	"@context": [
		{
			"ixif": "http://wellcomelibrary.org/ixif/0#",
			"oad": "http://filteredpush.org/ontologies/oa/oad#",
	        "mediaSequences":    { "@type":"@id", "@id":"ixif:hasMediaSequences", "@container":"@list"},
	        "elements":          { "@type":"@id", "@id":"ixif:hasMediaElements", "@container":"@list"},
	        "compatibilityHint": { "@id":"ixif:compatibilityHint", "@type":"@id"},
		}
	]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.