Skip to content

Instantly share code, notes, and snippets.

@tomcrane
Last active August 29, 2015 14:05
Show Gist options
  • Save tomcrane/a1ef892b1fc5037c9f1b to your computer and use it in GitHub Desktop.
Save tomcrane/a1ef892b1fc5037c9f1b to your computer and use it in GitHub Desktop.
Wellcome LIbrary Authentication and Authentication notes

Auth Notes

I appreciate that these aren't technically use cases, the use cases I had were too cluttered by the process of acquiring a cookie rather than the metadata's appearance in response to that cookie. We haven't had to solve the problem of describing in metadata HOW a consumer might acquire the authentication token our service layer is looking for, our viewer just knows - which obviously doesn't lend itself to interoperability. Nevertheless I hope these are useful in describing what Wellcome does in the rest of the authentication and authorisation scenarios.

These are the assumptions and use cases that underpin the Wellcome Library's authentication and authorisation requirements.

For clarity, I'll use terms similar to IIIF concepts (e.g., manifest) even though the Wellcome Library doesn't yet use IIIF. Also it's worth pointing out that if IIIF Auth turns out not to be implemented this way, it doesn't mean we can't adapt to it!

Assumption: The manifest doesn't contain any secrets - we don't mind people seeing all of the metadata, even if we want to prevent some of those people seeing the images the metadata describes. For example, bibliographic information or image dimensions are publicly available for everything that has been digitised.

The source from which most of the metadata is derived is METS, although this is an implementation detail and should have no bearing on the IIIF implementation - but METS is often going to be the origin of this stuff in other organisations too. Access condition information in the manifest is derived from metadata applied to structural elements in METS. So the METS structural information is being used used for two purposes:

  • to convey information about the logical structure of the work, from which a viewer application might generate a table of contents or other navigation devices
  • to convey information about the access condition(s) associated with all or parts of the image sequence

These two purposes don't have to coincide.

Access conditions can be associated with an entire sequence, and optionally arbitrary ranges which may correspond to chapters or any other structure within the sequence. In some cases the structure in the METS (and hence the range in the manifest) will only exist for the purpose of attaching more restrictive access control to a few of the canvases. For example a set of archive letters in which one particular letter contains sensitive information that requires a different level of library membership. A structural element will have been created in the METS in order to attach a more restrictive access condition to part of the sequence (in this case a part with just one image), which ends up as a range in the manifest containing one canvas.

This may be at odds with IIIF because the use of ranges in IIIF at present is to convey presentational structure, typically to generate a table of contents (TOC) for navigation. We're also using our equivalent of ranges to carry access conditions, and our viewer knows that only certain types of ranges are used to generate the TOC, but any could be interrogated for access conditions. So each range would need to convey its purpose or motivation:

  • I'm a piece of structural that you might want to use to make a TOC
  • I'm just here to isolate these two canvases so that they can be more protected than the rest of the sequence

In the Wellcome case, every range can have access conditions, but not every range should be used by a viewer to convey the structure of the work.

Wellcome Access conditions

This is an implementation detail, for interest. The access condition applied to any range must be one of:

  • Open
  • Requires Registration
  • Clinical Images
  • Library Staff
  • Restricted
  • Closed

Apart from "Open" they are really just arbitrary flags, so this section is here to demonstrate Wellcome's particular auth implmentation.

All users of the site including unauthenticated users have permission to see "Open" sections/ranges; most digitised books are "Open". The next step up is "Requires Registration", which implies that the user has at some point at least glanced at some terms and conditions and agreed to them. Most archives are at this level because in the Wellcome Library's case they are usually the personal letters of living (or more usually, recently living) people.

There is no assumption about what "Requires Registration" actually means, and in fact the Wellcome Library has recently changed the behaviour here to encourage casual usage:

http://wellcomelibrary.org/player/b20047459

This used to present an unauthenticated user with a login prompt and option to register (or log in via social media) including a "Ts & Cs" checkbox at some point, but has been reduced to a "click-through" acceptance of terms and conditions. Anything higher than "Requires Registration" still prompts for login via a proper library account. This new click-through still logs you in a special "guest" user, and creates a cookie - that meant we didn't need to change the authentication module that sits in front of the image servers and DDS servers (which provide all the metadata).

Examples

The most common scenario, which applies to most digitised books, is that the manifest contains a single sequence and that sequence has a simple structure. The ranges identify front cover, title page etc, and all of the ranges have the same access condition of "Open".

The next most common scenario applies to archives, where each manifest has one sequence corresponding to a particular archival set of files. These tend not to have any structure, because they are usually a group of personal letters or files, so the structure has one range to which an access condition of "Requires Registration" has been attached. This can be seen in the example above. The table of contents is hidden because there is no structure to display.

Almost all the material falls into one of the above categories. Here are some other quirky scenarios:

http://wellcomelibrary.org/player/b19813508 - The second image is restricted

http://wellcomelibrary.org/player/b11607798 - the root section has the access condition "Clinical Images"

Implementation

It is not enough to just tell the viewing application that it shouldn't request certain images, we have to access control every image request (including thumbnails and tiles). Someone could always "scrape" an image from its tiles.

The manifest (or our equivalent) conveys the access conditions on each section/range and also conveys the current user's authentication status for each range. This means that

http://wellcomelibrary.org/package/b19813508

(our manifest equivalent) is not the same for all users or for the same user at different times. If requested with the right cookie a section might declare

"accessCondition": "Requires registration",
"authStatus": "Allowed",

...and a tile request on an image in this section would succeed, but at another time the same request without the cookie, or with an expired cookie, would include

"accessCondition": "Requires registration",
"authStatus": "Denied",

...at the same place in the package, and a tile request on an image in that range would return an HTTP 403 response. We return a "bare" 403 response (not a placeholder image) because we assume that the viewer application should never cause such a request to be made; it knows from the manifest whether a request would succeed. In the single restricted image example, the padlock thumbnail is there because the player knows from the manifest that the image request is not going to work - it's using its own UI to convey this.

I'm not entirely comfortable with the manifest http://wellcomelibrary.org/package/b19813508 returning different data for requests from different people and/or at different times and had been considering removing the authStatus from the manifest, so that the manifest just conveys the access conditions (which don't change). A new, separate service would return a much smaller set of metadata that just describes the authStatus on each range for the given auth token (supplied as a cookie value, or in some other way such a POST body or query string param). So a viewer application could see that the manifest is not all Open and request the authStatus metadata, which will tell it which bits the current user can see, and which bits the user could try to acquire authorisation to see. This much smaller set of metadata would be the only dynamic content, leaving the manifest to be cached as much as we like. I have held off on this though now that IIIF auth is on the horizon, but it's been bugging me - that authStatus is the only thing that varies between requests in our manifest equivalent.

We have one exception to the general image access rule - Thumbnails of "requires registration" content are visible to anonymous users, even though a tile request would fail. This is easier for us to do at present because thumbnails are a different service from tiles (or other arbitrary regions) but would be trickier in IIIF, and would have to be generalised as access conditions relating to size, which at present is very simple (whether or not the user can download the full image at high resolution, arbitrarily defined). So we don't currently face the problem of describing in the metadata that YOU, the current requestor, are able to request this image (or canvases in this range) up to 150px on a side but will be given a 403 if you request higher.

Currently Wellcome do not have any other access conditions relating to image size or quality, but we do have a set of permitted operations which are currently only defined for the work as a whole:

"permittedOperations": [
 "currentViewAsJpg",
 "wholeImageHighResAsJpg",
 "wholeImageLowResAsJpg",
 "entireDocumentAsPdf"
],

Although logically the first three of these (in this case) should really apply to the current range not the work as a whole; 99.9% of the time this makes no odds but we do have a few edge cases where this mechanism doesn't quite work, where our player makes the option available based on the work's root section not what canvas the user is currently looking at.

Performance note

Wellcome are lucky in that an access control decision can be made very quickly without having to look anything up for the current user. There are a small set of universal access conditions that map to permissions that a user may or may not have; this set of permissions is small enough to store as flags in an encypted cookie alongside other bits and pieces. So the authorisation question "can this user see the requested content" can be anwered by our server just by looking at the cookie and the range, it doesn't need to look anything up in a "licensing" database or similar, there is no additional I/O overhead in authorising a tile request. Wellcome users don't have permissions on individual items in the library, just on a very small set of conditions. This may not apply in other scenarios where IIIF could be used, where there is more complex licensing (e.g., a picture library) on a per-work (per-sequence) basis. Servers would have to be careful about the implications of authorising atomic tile requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment