Skip to content

Instantly share code, notes, and snippets.

@nagydani
Created April 12, 2016 15:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nagydani/ea69e893dbfa52c3b375189b4a747360 to your computer and use it in GitHub Desktop.
Save nagydani/ea69e893dbfa52c3b375189b4a747360 to your computer and use it in GitHub Desktop.
TOPIC DISCOVERY
1. Introduction
Each node in the Ethereum network has a set of tags called "topics" by
which other nodes may choose to associate with them. They may indicate
capability, responsibility for certain information or functionality or
any other attribute by which the node may wish to belong to a connected
subnetwork of the Ethereum network. In this document, an infrastructure is
outlined by which nodes of a certain topic may discover one another.
The presented infrastructure can be used for two distinct purposes:
nodes covered by the same topic can use it to form and maintain a
connected network, while nodes not covered by a certain topic might use
it for finding nodes covered by that topic for using a service provides
by such nodes.
Topics are assumed to potentially cover arbitrarily large or small parts
of the Ethereum network, meaning that certain topics can contain only
one node while other topics may span the entire network. Irrespective of
the number of nodes covered by a topic, the discovery infrastructure should
provide a suitable set of bootstrap nodes for new nodes wishing to join
the network or those wishing to use the services of nodes covered by said
topic.
Furthermore, it is assumed, that each node is covered by at most a few dozen
topics.
2. Topic structure
On the lowest level, topics are assumed to be unstructured, denoted by a
string of printable characters. On a higher level, two possible
hierarchical topic structures are defined: one in which nodes covered by
a topic and all its subtopics are searched and another in which nodes covered
by a topic and all its supertopics are searched. Subtopics are denoted by
a string that has the parent topic as its prefix, followed by some delimiting
character.
The two higher-level hierarchies are handled as follows: in the first
case, when nodes covered by subtopics need to be discoverable, nodes
must be tagged by their topic and all their supertopics (e.g a node
covered by "animal.mammal.dog" is also covered by "animal.mammal" and
"animal", so that a search for "animal.mammal" will find it), in the
second case, when nodes covered by supertopics need to be discoverable,
the node doing the discovery also searches for all the supertopics.
Apart from this, the directory is topic sturcture agnostic.
The DHT distance measure between a topic and a node's DHT address is
defined as the XOR of the latter with the hash of the topic string.
3. Architecture
Two states of discovery are clearly distinguished: before and after the
searching node made the first successful handshake with a node covered by
the searched topic.
In the first state, the node searches the topic directory organized as a
DHT described later in more detail. From the topic directory, it will
obtain the contact details (network address, port and public key) of a
relatively small set of matching nodes to which it attempts to connect.
Upon the first successful handshake, it transitions to the second state.
In the second state, nodes covered by a certain topic introduce each
other to the searching node. The strategy followed in this state may
well depend on the requirements of the topic and the desired
(sub-)network topology. Thus, the discussion of this second state search
falls outside of the scope of this document. Nodes can transition back
to the first state if for any reason they disconnect from all other nodes
covered by their topic.
Each node participating in the DHT stores records for a (potentially)
large number of topics and at most a small number of records for each
topic. Each record binds one node to one topic. A record contains, in
addition to the topic, the network address, the port and the public key
of the node as well as a timestamp and a digital signature by the
referred node calculated on the rest of the record.
4. Registration
Nodes regularly register themselves for each topic covering them by pushing
the corresponding records into the DHT. Only registrations with a very recent
timestamp and a correct signature are accepted. If the DHT is direct (no
content forwarding), the originator network address is also checked, as nodes
are only allowed to register themselves.
In order to prevent overload on nodes with addresses close to popular
topics, the time interval for these registrations is controlled as
follows: Upon registration, the timestamp of the most recent registration
is returned to the registering node. The repeated registration is scheduled
at a delay inversely proportional to the age of the most recent record.
If the capacity for records for one topic is full, a random record for
the topic in question is deleted. As a security measure, nodes with overly
frequent registrations may be blacklisted.
5. Retrieval
First, a DHT node sufficiently close to the topic is found. Upon request
(i.e. search by topic), all sufficiently recent records are returned.
If the DHT is implemented without query and data forwarding, this might
result in excessive hammering of nodes close to popular topics. For such
implementation, it is advisable for registering nodes to register
themselves in every step of DHT lookup, not only at nodes closest to the
topic. This way, retrieval lookups are likely to find registrations
closer to themselves and further away from the closest node in the DHT.
Another possible defense against hammering by retrieval requests is to
use them sparingly: both topic member nodes and client nodes must strive
to remain connected to the topic network by queriing their neighbors for
further connections in order to maintain a healthy redundance of
connectivity without resorting to discovery.
Forwarding DHT implementations, however, avoid the issue altogether by
propagating and caching records along the path between the node on which
it was found and the querying node.
6. Proposed implementation strategies
a) Leave the UDP-based Kademlia discovery intact and oblivious of topics,
its only purpose being to maintain a connected network of assorted
Ethereum nodes. The above proposed mechanism would be implemented as a
p2p subprotocol (named, e.g., DIR) with a forwarding Kademlia similar to
that of Swarm, based largely on Swarm codebase. Instead of storing one
Swarm chunk per key, it would store a fixed number of topic discovery
records per key.
b) Add topic discovery directly to the existing UDP-based node discovery
Kademlia and implement the anti-hammering measures proposed in the
previous section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment