Skip to content

Instantly share code, notes, and snippets.

@parkan
Last active December 23, 2015 13:07
Show Gist options
  • Save parkan/e98f9f13e111e3409438 to your computer and use it in GitHub Desktop.
Save parkan/e98f9f13e111e3409438 to your computer and use it in GitHub Desktop.
CCR Resolver

Design

This is the layer of the Canonical Content Registry that allows location of metadata identifiers pointing into the registration store based on perceptual similarity. The DHT is keyed by an appropriate perceptual hash (for example, Fixed Length MH Hash from phash.org for images) hashed again using a Locality Sensitive Hash such as RHH, which allows for efficient similarity search as in Hamming DHT. Keys that collide fully are chained and may be disambiguated as described below.

As in the Hamming DHT, near match keys may be returned, up to a threshold.

The values include a "name" identifier for the registration store and possibly several other perceptual hashes or other derived data (e.g. haar-like features, histogram, etc) for the work that can be used to further disambiguate the query. The client may send up these additional verification criteria and threshold values that must be met.

A Registration Store entry does not necessarily need to have a Perceptual Resolver entry pointing into it, but all resolved values must point at an extant registration, therefore the resolver entry should be written after blockstore accepts the registration.

Example flow

  1. A photographer wishes to register an image

  2. She first registers a Onename identity for herself and verifies her other accounts

  3. She then performs the 2-step blockstore registration process with some arbitrary name (e.g. a uuid+namespace 5100cda8-b09b-49ea-b105-1d5d29e92a96.ccr) in the CCR namespace and a profile like:

        {
            "@type": "Person",
            "name": "May Lin Le Goff",
            "id": "maylinlegoff.id"
        }
    
  4. Once blockstore accepts the registration, she writes the payload

    {
      'name': '5100cda8-b09b-49ea-b105-1d5d29e92a96.ccr'
    }
    

    to the DHT key of RHH(phash($image_bytes)). This value is accepted by the appropriate Gray code peer after passing through the ring

  5. A user wishes to find information on a cropped, compressed version of the photo. She queries the Hamming DHT resolver for RHH(phash($query_image_bytes)) with a similarity threshold of 0.95, but no results are returned after 4 hops

  6. The query is repeated with similarity threshold of 0.8, which yields the resolver entity

  7. The actual metadata is then retrieved as blockstore-cli lookup 5100cda8-b09b-49ea-b105-1d5d29e92a96.ccr

@bedeho
Copy link

bedeho commented Dec 23, 2015

  1. Is phash the feature detection explained here:https://github.com/mine-code/canonical-content-registry ?
  2. What is the complexity of phash in terms of the image size, and what is the typical runtime for a typical 400x600 image?
  3. Is there a phash type construction for other media, like video, audio or executable binaries?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment