Skip to content

Instantly share code, notes, and snippets.

@mildred
Last active August 29, 2015 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mildred/3c4678595f2d245aa436 to your computer and use it in GitHub Desktop.
Save mildred/3c4678595f2d245aa436 to your computer and use it in GitHub Desktop.
Finding backlinks in a DHT

Nodes that store the content store the links

  • a backlink is a dictionary (that could be represented as JSON to be human-readable) containing a key target with the hash of the object it is about. Other fields are not necessarily fixed. The object size must be small enough (to fit in a UDP packet).
  • to publish a link, the author must regularly send an advertise message to all nodes that store the target. Nodes are supposed to accept or refuse to store the link. But if they refuse to store a link to a specific object, they must also refuse all other links for this object. If the link is malformed (they don't have the object referenced by target), nodes can reject it safely.
  • to query a link about a target, a node must query the owner of the target for links about it. If no links are returned, the node must query other nodes having the target until one node having a link is found. Then the search can then stop.

Notes:

  • when querying links about an object we just downloaded, this method should be the fastest because the links are on the same node as the object it is about
  • advertisement of a link is a bit harder, but is is supposed to be done by a daemon software that is a long running process.
  • some nodes could choose to remove some links. The nodes that store the link are the nodes that might be the most emotionally linked with the target, and might want for personal reasons to stop some links.
  • nodes don't risk being flooded with links because of some random reason, if they receive links it is because they are involved in some way.
  • if a querying node suspects that some links might have been dropped by some nodes, the node can always query all the publishers of the target and concatenate the list of links obtained (after removing duplicates)
  • if the publisher of a link suspects that his link might be dropped (because it is controversial), he could host the target object himself and be sure that cautious nodes could still

Random nodes store the links

  • a backlink is defined the same as above.
  • to publish a link, the author must regularly send an advertise message to N nodes whose id is close in the DHT to the target of the link. Nodes are supposed to always store the link.
  • to query a link about a target, a node must query M nodes whose id are close to the target object and build a list of objects.

Notes:

  • this method could cause longer lookup times when we lookup links about an object we just downloaded
  • advertisement is simpler because only N nodes have to be contacted for advertisement
  • highly vulnerable to sybil attacks
  • nodes about a target that generate lots of links might be flooded and might not have the capacity to respond to all requests
  • the node that store the link is not supposed to have an emotional relation to the target object (false for sybil nodes).

Link issuer store the link

  • a backlink is defined the same as above.
  • to publish a link, the author must publish a file containing literally "link_about:" followed by the hash of the target object
  • to query a link, a node must search for all nodes publishing the object "link_about:" followed by the target hash, and then query each of these nodes the links they have about the target object.

Nodes:

  • this is the safest, none can easily stop a link from being posted
  • this is possibly the most longest method to query links
  • this could mitigated by having the noes storing the target also store most links. If a querying node is not really interested about having all the links and not a single link dropped, this could be easily used to make faster query times.

Mixed approach

  • a backlink is a dictionary (that could be represented as JSON to be human-readable) containing a key target with the hash of the object it is about. Other fields are not specified at that point.
  • to publish a link, the author must publish a file containing literally "link_about:" followed by the hash of the target object. They could optionally notify the publishers of the target object.
  • to query a link, a node must search for all nodes publishing the object "link_about:" followed by the target hash, and then query each of these nodes the links they have about the target object.
  • nodes hosting the target object could cache the list of links they gathered about their target and give it to the querying nodes that trust them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment