Skip to content

Instantly share code, notes, and snippets.

@tomcrane
Last active August 29, 2015 14:17
Show Gist options
  • Save tomcrane/4df7ab8b659ef814d8ad to your computer and use it in GitHub Desktop.
Save tomcrane/4df7ab8b659ef814d8ad to your computer and use it in GitHub Desktop.
First pass at search within implementation

This is based on the implementation described in IIIF Search Straw Person: https://docs.google.com/document/d/1F9-zelMCJqVjh-UMk8VeaPULBY2yRbI17X5EsrtcmSY

...and on the comments on this thread: https://groups.google.com/d/msg/iiif-discuss/cecb6B_21kk/3GIwpBWPMAsJ

The Wellcome Library already has search within, based on OCR data persisted in ALTO files. The implementation can be seen on any searchable book or archive: http://wellcomelibrary.org/player/b18035978

There are two services involved: http://wellcomelibrary.org/service/autocomplete/b18035978/0?term=gener http://wellcomelibrary.org/service/search/b18035978/0?t=human%20heart

The autocomplete one is very useful in the constrained "search within" scenario, especially over texts that may contain technical terms.

I have left the autoComplete one for now as there has not yet been any discussion about that. It would also mean adding two services to a manifest, which I'm not sure how to do! I have concentrated on providing the search functionality as IIIF JSON-LD.

##Search Within implementation##

The first step is to add the service to the manifest: http://wellcomelibrary.org/iiif/b18035978/manifest

(These manifests are not yet driving the public viewer. They are valid, and the image services they reference are live, but the metadata is incomplete and I have some modelling and security questions to sort out. They work in mirador or BL viewer.)

  "service": {
    "@id": "http://wellcomelibrary.org/service/iiifsearch?within=http://wellcomelibrary.org/iiif/b18035978/manifest{&q,offset,n,motivation}",
    "profile": "http://iiif.io/api/search/1/",
    "label": "Search within this manifest"
  }

This is modelled on the example in the Google Doc, using a single search endpoint and the "within" param. The only difference is that I have used the URI Template syntax from https://tools.ietf.org/html/rfc6570#section-3.2.9. This is suggested in http://www.w3.org/ns/hydra/spec/latest/core/#templated-links which might be a future approach. I'm not sure if a templated URI like that is a valid @id value - but is "..&q=" either?

Although I'm advertising those params I haven't implemented support for any of them other than q (I'll leave that for later). I only have OCRed full text to search over, we don't have any annotations (yet).

A query to this service will return all results: http://wellcomelibrary.org/service/iiifsearch?within=http://wellcomelibrary.org/iiif/b18035978/manifest&q=human%20heart

This exactly the same as the example in the document, but with the addition of "before" and "after" as discussed on the thread, to show the result in context (the Internet Archive BookReader does this; we haven't made use of this data in the Wellcome UI yet).

I've also added resultIndex. The third and fourth annotations in the list have the same resultIndex, because they are part of the same result - the result requires two rectangles (see image attached). I'm not sure how to group the results otherwise. In the original version at http://wellcomelibrary.org/service/search/b18035978/0?t=human%20heart, the object graph has rectangles within search results, a result has an array of rectangles (even though there's usually one). The annotation list flattens this out, so how do we preserve two separate annotations as a single "result"? This also means there are more annotations in the list than the "total" and "pageSize" might suggest.

two line results

Other observations - I'm guessing this is a lot harder to process client-side to produce the effects seen in the viewer - but that difficulty can be hidden behind a JavaScript API.

Anyway - here is a working "search within" to play with. It will work on any Wellcome Library searchable content:

http://wellcomelibrary.org/service/iiifsearch?within=http://wellcomelibrary.org/iiif/b18034706/manifest&q=wedge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment