Skip to content

Instantly share code, notes, and snippets.

@denisnazarov
Last active December 7, 2015 20:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save denisnazarov/4fa57dcd2260797f80b8 to your computer and use it in GitHub Desktop.
Save denisnazarov/4fa57dcd2260797f80b8 to your computer and use it in GitHub Desktop.
Canonical Content Registry

Canonical Content Registry

Today, there is no reliable way to persist metadata for digital media as it travels across the internet.

Mine is working to build a global content registry on top of the Bitcoin blockchain to serve as an open metadata layer for canonical representations of digital media.

The goal of such a registry is to enable a new decentralized hypermedia protocol that powers the next generation of digital content applications, where creators and consumers to own their media, identity and interactions across the internet, without dependency on industrial or platform gatekeepers.

In March, we published a high level summary of how such such a system could work, titled the Canonical Content Registry. Today, we are taking the first steps to start building it by sharing a proposal for a technical implementation on top of Blockstore. We welcome your feedback and look forward to starting a conversation.

Registration Store

This layer stores the actual metadata and annotations on top of blockstore. We use a namespace with a very low registration cost, no length or vowel discounts, and no expiration. Each work is stored under an opaque identifier "name" with a metadata "profile" [NOTE: this may still be prohibitively expensive due on transaction costs, so we may need to bundle multiple registrations up to the full 8k block]

Primary metadata (authorship, etc) is written as immutable data and requires an update transaction to change. References to persons (author, model, performer, etc) should be in the form of namespaced Onename ids. Freeform annotations are written as mutable data [needs more thought]

The registrant of a work is considered its "custodian" and may or may not be the author. The current design does not allow for dispute resolution or transfer other than voluntarily by the current custodian [needs work]

Perceptual Resolver

This layer allows location of metadata identifiers pointing into the registration store based on perceptual similarity. The DHT is keyed by an appropriate perceptual hash (for example, Fixed Length MH Hash from phash.org for images) hashed again using a Locality Sensitive Hash such as RHH, which allows for efficient similarity search as in Hamming DHT. Keys that collide fully are chained and may be disambiguated as described below.

As in the Hamming DHT, near match keys may be returned, up to a threshold.

The values include a "name" identifier for the registration store and possibly several other perceptual hashes or other derived data (e.g. haar-like features, histogram, etc) for the work that can be used to further disambiguate the query. The client may send up these additional verification criteria and threshold values that must be met.

A Registration Store entry does not necessarily need to have a Perceptual Resolver entry pointing into it, but all resolved values must point at an extant registration, therefore the resolver entry should be written after blockstore accepts the registration.

Example flow

  1. A photographer wishes to register an image

  2. She first registers a Onename identity for herself and verifies her other accounts

  3. She then performs the 2-step blockstore registration process with some arbitrary "name" (e.g. a uuid+namespace 5100cda8-b09b-49ea-b105-1d5d29e92a96.ccr) in the CCR namespace and a "profile" like:

    {
      'author': 'May Lin Le Goff|maylinlegoff.id',
      'pubDate': '2015-12-04T20:59:17+00:00'
    }
    
  4. Once blockstore accepts the registration, she writes the payload

    {
      'name': '5100cda8-b09b-49ea-b105-1d5d29e92a96.ccr'
    }
    

    to the DHT key of RHH(phash($image_bytes)). This value is accepted by the appropriate Gray code peer after passing through the ring

  5. A user wishes to find information on a cropped, compressed version of the photo. She queries the Hamming DHT resolver for RHH(phash($query_image_bytes)) with a similarity threshold of 0.95, but no results are returned after 4 hops

  6. The query is repeated with similarity threshold of 0.8, which yields the resolver entity

  7. The actual metadata is then retrieved as blockstore-cli lookup 5100cda8-b09b-49ea-b105-1d5d29e92a96.ccr

Prototype Implementation (MusicBrainz port)

For the prototype, a portion of the MusicBrainz database is used to populate the Registration Store, with corresponding AcousticID fingerprints used for the resolver, with the following modifications:

  • Instead of the Hamming DHT approach for the resolver, a simple network flooding search is used
  • The relational MusicBraiz schema is translated to something similar to schema.org's MusicRecording and written as primary metadata
  • Additional hashes are not used
  • Resolver implementation possibly built on Overlay Weaver

Problem Areas

  • Most obviously, no process of dispute or moderation exists; this appears to still be a bit of an open problem in distributed systems. The stopgap measure is to only allow writes from trusted parties, however solving this problem is essential and necessary for the success of this project
  • The resolver is very vulnerable to sibyl and forgery attacks. Because the values returned by design do not hash to the keys requested, unlike in a traditional DHT, the client cannot verify returned results. It's probably possible to manipulate the node id such that a desired piece of media hashes in that node's neighborhood, then return a forged registration pointer (in a scenario where this system is used to dispatch micropayment for song plays, for example) or simply overwrite the real value
  • Computational overheads can be fairly substantial if verification happens on hosting nodes, so we can either just return the entire hash set and let the client verify or do a pay-to-query (more satoshi for more certainty?)
  • It looks like each registration/blockstore write requires [correct me if I'm wrong] a fresh keypair, which will grow prohibitively expensive, possibly unless registrations are batched into blocks

Written by @parkan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment