tomcrane/IIIF-auth-through-annotations.md

## IIIF-auth-through-annotations.md

      
    Raw
  

              IIIF-auth-through-annotations.md
            
          
    Ref: https://github.com/IIIF/iiif.io/blob/master/source/event/2014/london_auth_scenarios.md
Ref: https://github.com/IIIF/iiif.io/blob/master/source/event/2014/london-auth-workflow.png
Reiterate assumptions from london_auth_scenarios:
Assumption 1: In both the Image API and the Presentation API, a given URI for an Image Information request, an Image request, or a Manifest request will result in the same bytestream if access is granted, irrespective of credentials and user identity.
(Is this compatible with london-auth-workflow.png? Think so because of the 301)
Assumption 2: The Image Information response always correctly describes the Image requests supported for the given base URI (scheme://server/prefix/identifier). This implies that different base URIs are required to support degraded and full access (or multiple levels of degraded access).
Additional assumption: the presentation or image metadata contains no secrets. The auth problem is in describing access to image services (i.e., pixels), not in protecting strings in metadata.
Additional Assumption: if information about access control is included in any IIIF data, it should be silent on the authentication and authorisation mechanisms and rules. It can state that there is some form of access control, it can state what effects that will have on image API experience, it can provide a human-readable piece of metadata to display to users, it can provide a URI intended for a human to visit to acquire credentials.
These assumptions suggests that a published “canonical” manifest may have to be one that is degraded (even to the point of asserting that no image requests will work at all) because there may be several other levels of experience that could only be gained by acquiring credentials; often there may be only one other level (full access) but there could be more levels in between. Which version should be the “official” one?
Canonical manifest

Consider a web page for an item in a library catalogue. At the foot of the page, next to the links to the bibliographic data in various RDF serialisations, is another link: the IIIF manifest. This item is a book and has been digitised. Anything in this library’s catalogue that has been digitised has this link available. The library thinks it is just as important to link to the IIIF from the canonical catalogue page as it is to link to bibliographic data in Turtle, JSON-LD and so on.
Many items in the library catalogue are archives that have access conditions.

Often the archives mention personal details of living or recently
deceased people. To view these archives a user must acknowledge some
terms and conditions, so that donors of such personal material are
reassured to some extent that the much more widespread availability of
the material resulting from digitisation doesn't lead to its abuse.

A user must authenticate, acquire a credential in the form of a cookie before the Image Server will respond to a IIIF image request with anything other than a 403 status.
Sometimes the restrictions are more granular. A user who has authenticated may still be denied access to image requests for a particular image, or within a canvas, or range, or sequence because some additional constraints have been applied to those structural elements. The structural elements (ranges) may have been created for the sole purpose of having access control information attached to them. That user would need to acquire better credentials to see the still hidden images.
In either of the access-controlled scenarios described above, what manifest URL should the library publish on the canonical catalogue page? Should they always publish a manifest that describes the image services that an unauthenticated user would be able to see?
This implies that for the access-controlled archive example above the image services referred to, or embedded, would need to convey the degraded or “unviewable” response.
The vocabulary is already available to describe the degraded response, although I’m not sure how best to use it. For example if the degraded response is that the image is limited to 500px on a side, should the height and width properties of the image info be adjusted appropriately, with “sizeAboveFull” implied as not available by its absence? Or should the image dimensions be retained as useful metadata in the canonical manifest and a new term introduced, such as maxSize, to indicate that although the original image is 6000px wide the server won’t return any generated image greater than 500px on a side, or tile at a resolution greater than would be available from a 500px master?
If the degraded experience is bitonal only (for example), then "qualities" : [ "bitonal" ] isn’t enough, as such additional profile properties are added to the declared base profile; we’d need to subtract somehow, or start with a “cleared” profile and then add the limited support back in. This might get unwieldy very quickly, you wouldn’t want to keep adding in the HTTP Features part of the profile (for example).
Understandably the Image Information lacks the vocabulary to indicate that no image is available – for example that a 6000 x 4000 image is present but you can’t get any pixels from it:
{
  "@context" : "http://iiif.io/api/image/2/context.json",
  "@id" : "http://www.example.org/image-service/abcd1234/1E34750D",
  "protocol" : "http://iiif.io/api/image",
  "width" : 6000,
  "height" : 4000,
   
  "profile":"http://iiif.io/api/image/2/profiles/unavailable.json",

  "service" : {
     // we’re not denying access to other non-image services
     "@context": "http://ex.org/someotherservice/context.json",
     "profile": " http://ex.org/someotherservice/pp",
     "someOtherServiceData": 99.99
  }
}
Or maybe, instead of the special profile, a new property:
"profile" : [
  "http://iiif.io/api/image/2/level2.json",
  {
    "supports" : [
      "http://extensions.org/iiif#imageServiceUnAvailable"
    ]
  }
]
(although this doesn’t feel quite right as a “supported” operation – perhaps the profile needs to offer some other way of asserting this)
For the archive described above the published canonical manifest is unsatisfactory in all cases. If it tells the truth for the unauthenticated user it has to describe a sequence of images for which it asserts there is no service. No viewer would ever want to consume this manifest. If we decide to publish a manifest that describes what an authenticated user could see we at least have published something more interesting, but it won’t work when loaded in a viewer without further information about the fact that the services are protected, and how to acquire credentials. Even if we do want to publish a more image-filled manifest, we are still faced with a problem where the access control is anything other than binary – what about more nuanced scenarios with various levels of degradation, or different access conditions on different parts of the structure. There’s no “right” manifest to publish other than the one that will work as advertised for an unauthenticated user, even if it consists entirely of image information service elements that assert no image service is available. A complex authentication scenario could result in an explosion of possible manifests.
Either we publish in the manifest what an unauthenticated user can see and provide some more information about what the current user can see (which might be more that the unauthenticated user) along with information about how to acquire or upgrade credentials, or we publish what the highest possible authenticated user could see (the best profile for all images – what we know our image server CAN support in the absence of any access control considerations) and provide some more information about what the current user can actually see right now (which might be less – considerably less – than the manifest describes) along with information about how to acquire or upgrade credentials.
In the second case the image service information (whether embedded or requested as info.json) would contain the full profile supported. This is not necessarily what the current user is permitted to see, and may conflict with Assumption 2.
Assumption 2 feels right. But does it lead to greater complexity elsewhere?
###Annotation Lists to describe access control
The following discussion examines the use of annotations to provide information to a viewer application to describe what the current user could see if they made a request to the image service with the same credentials they used to obtain the annotations. The annotations allow the viewer to generate appropriate UI (e.g., placeholders for images it knows it isn’t authorised to request). The annotations are "on" (oa:hasTarget) the appropriate parts of the structure to provide an override of what the published manifest asserts is available.
I think this might lead to a conflict with assumption 2, but I want to explore it to see where it goes. I’ve based the approach on what we have already done at Wellcome in a non-IIIF, JSON-rather-than-JSON-LD way. I think I could make this approach cover all of Wellcome’s current auth requirements in IIIF, but I don’t know how idiomatic this is for IIIF. Are annotations the right way to do this? Should the access control information come from a service instead? How easy will it be to make use of this information in a viewer?
####Describing access####
An access control descriptor (whatever that might be) could apply to individual images, canvases, sequences, ranges, manifests (and also to annotation lists, services and other resources). It might apply to a range that has been created just to have access control information attached to it. Often access control might be applied to the same structural elements that a viewer might use to make a table of contents, but not always – an archive of personal correspondence might consist of one simple sequence, there is no logical table of contents. A range is created to isolate two canvases that contain sensitive information. This means range isn’t always used as currently described in the presentation API: “The intent of adding a range to the manifest is to allow the client to display a structured hierarchy to enable the user to navigate within the object…” All that is required is a new viewingHint for range that indicates its function is not for navigation-generation but something else, some other logical organisation. A viewer should ignore such a range when building a navigation UI. In every other respect range is perfect for the task of acting as a target for an annotation that describes access control.
The following starts from the needs of a viewing application consuming a manifest and works down to the individual image request.
It seems natural to use annotations to convey this information. The IIIF vocabulary already has “otherContent” as an alias (term) for sc:hasLists, so a manifest (or other element) could have, amongst the other annotations:
"otherContent": [
  {
    // same origin as IIIF service @id
    "@id":"http://www.example.org/iiif/book1/list/access", 
    "@type":"http://extensions.org/iiif#authAnnotationList"
  }
] 
Given the importance of drawing attention to access control information, this type of annotation list might warrant its own term, as is the case for “images”, keeping it separate from “otherContent”.
The presence of an annotation list of type http://extensions.org/iiif#authAnnotationList tells a compliant viewer that the information in the manifest can be augmented or overridden by additional information contained in the annotations. A completely open manifest wouldn’t have an annotation list of that type anywhere. The absence of such an annotation list implies the manifest is open and identical for all requests.
This annotation list would be at the manifest level for the simple scenario of a completely closed work, but it could be on a range created especially for the purpose if a particular section of the work has different conditions.

Question – can a single annotation list provided for the manifest
contain annotations that are “on” finer structure within the manifest
(e.g., a particular range)? Or do the annotations in a list always
have to be “on” the element the list was declared on? If the former
the viewer could obtain all the additional information in a single
HTTP request.

This is where the approach starts to feel impure. The presence of an annotation list of type http://extensions.org/iiif#authAnnotationList implies that requests made for the URL in that list’s @id are NOT going to give the same results for all users. This particular @type implies that the server will generate the annotation list taking into account any information from the request it needs to make an authorisation decision – cookies, other headers, origin IP and so on. The API is silent on what criteria the server might use. It is almost certain that the @id will be a URI on the same domain as the image services, as it needs to use the same authentication strategy. It is expected that a request for the annotation list will be followed soon after by tile requests. The returned annotations tell a consumer that if it makes requests for images in the same way it made the request for the annotation list, then those image requests will be honoured if the annotations said they would be (within a reasonable time frame – again, whatever that means).
A manifest comprising images from multiple origins will need to contain at least as many separate auth annotation lists as there are origins, each one will need to be interrogated in turn.
####What might an annotation look like?####
Assume that the best possible manifest has been published – it advertises image service support up to the level supported by the image server. They might return a 403 if an unauthenticated user attempted to use that service. A viewer knows not to blindly attempt image requests against those services, because it has detected the presence of an annotation list of type http://extensions.org/iiif#authAnnotationList.
It dereferences the annotationList (using the same request context that it will later use for images) and gets this:
// this is generated in the context of the supplied credentials.
// a user who requests this after visiting #loginUri might get different results
{
  "@context":"http://iiif.io/api/presentation/2/context.json",
  "@id":"http://www.example.org/iiif/book1/annotation/anno1",
  "@type":"oa:Annotation",
  // a new motivation
  "motivation":"http://extensions.org/iiif#authDescription", 
  // anno hasTarget the sequence - is this OK? 
  "on":"http://www.example.org/iiif/book1/sequence/normal", 
  "resource":{
    // there's nothing to deref here, can we omit?
    "@id":"http://www.example.org/iiif/book1/list/access/anno1", 
    "@type":"http://extensions.org/iiif#profileModification",
    
    // This could be a more complex profile object - 
    // it could offer a degraded experience instead
    // whatever it is, it overrides any profile information
    // on ALL images 'under' the annotation target
    // maybe there's another special case for "as declared in manifest"
    "profile":"http://iiif.io/api/image/2/profiles/unavailable.json",
     
    // no defined term for this 
    "http://extensions.org/iiif#loginUri" : "http://mylibrary.org/login", 
    "description":"Access to this resource is restricted. Follow the supplied link to log in"
  }
}
This small chunk of JSON-LD tells the viewer that for all image services under the target of the annotation (in this case all images in the sequence) the supplied profile should override the profile advertised in the manifest. The supplied profile is in this case a special “unavailable” profile, but could be a more complex profile offering a degraded experience (e.g., bitonal, max sixe 500px as described earlier).
Suppose we have another archive (manifest2) that is restricted, but visible once logged in (as in the previous example). However, two of its images are not available at all, they have been removed. The user who saw the previous annotation list has now logged in, at the URI provided at the http://extensions.org/iiif#loginUri property.
The tail-end of the manifest now looks like this:
{
  //...
  //...
  "otherContent": [
    {
      "@id":"http://www.example.org/iiif/book2/list/access", 
      "@type":"http://extensions.org/iiif#authAnnotationList"
    }
  ],
  "structures": [
    {
      "@id":"http://www.example.org/iiif/book2/range/r0",
      "@type":"sc:Range",
      "label":"Table of Contents",
      "viewingHint":"http://extensions.org/iiif#notForNavigation",
      "canvases": [
          "http://www.example.org/iiif/book2/canvas/p33",
          "http://www.example.org/iiif/book2/canvas/p34"
        ]
    }
  ]
}
For an unauthenticated user the annotation list at http://www.example.org/iiif/book2/list/access could have two annotations, one for the sequence as a whole and one for the specific range.
For this authenticated user, the annotation list could omit the first annotation (because the user can see the regular images) but return the second annotation, which might look like this:
{
  "@context":"http://iiif.io/api/presentation/2/context.json",
  "@id":"http://www.example.org/iiif/book2/annotation/anno2",
  "@type":"oa:Annotation",
  "motivation":"http://extensions.org/iiif#authDescription", 
  "on":"http://www.example.org/iiif/book2/range/r0", 
  "resource":{
     //...
     "@type":"http://extensions.org/iiif#profileModification",
     "profile":"http://iiif.io/api/image/2/profiles/unavailable.json",
     "description":"Access to this resource is restricted online. Please visit the library in person."
   }
}
No #loginUri is provided.
####Inverse approach####
This whole approach could be turned on its head: the manifest describes truly what an unauthenticated user could see, and annotations override it to provide different profiles.
Whichever way round the augmentation/overriding happens, should the annotations convey a different profile (a different information response) against the same base URI (conflicting with assumption 2, but concise as the profile is assumed to apply to every image under the target of the annotation), or should the annotation provide a new image information response completely, with a new base URI for each image? This latter approach means that you’d need an annotation for each image, the body of the annotation would look very much like an image information response, the annotation list would start to approach the size of the manifest itself, and you might be better off pointing the user at a generated, bespoke manifest (the server would have to do about the same amount of work).
####Annotations in the info.json
In keeping with assumption 1 (but maybe violating assumption 2) the info.json for the image response could allow a link to an annotation list (containing just one annotation). I’m not sure where this link belongs. Individual images don’t need annotation when embedded in the manifest, because the viewer can see annotations on higher level structures in the manifest. But the image information respponse for each image subject to access control would need an annotation list link, the presence of which alerts a consumer that access control is in operation. It provides a means of directing a user to login screens etc.
It feels “bloaty” and results in more HTTP requests in an access controlled scenario. But it preserves the info.json as an invariant, canonical representation – the info.json won’t change, but the response from the annotation list will, and that response leads you to a decision about how to consume the advertised image service (if info.json is the “ideal” image service) or to an alternative image service that the server knows you will be able to use.
###Summary
Use annotation(s) to:

provide a standalone profile (not associated with a particular image service), that can describe a degraded experience all the way down to no image at all
apply this single profile to (1-n) images implied by the target - those images "under" the target in some way -
direct the user to somewhere they can acquire credentials to upgrade their experience
Need an "empty" profile (level -1) to which we can add supported operations

Problems

Is this overriding and cascade down idiomatic? Ugly?
Is this a correct use of annotations?
What about multiple images on each canvas - only the raking light images "under" the target have this override profile applied, the others are OK - how to isolate the profile to a particular class of image service
GET request to annoList is different for different users and times
Assumption 2