Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Monitoring and blocking the bittorrent monitoring spies

##Target

Protecting the privacy of the bittorrent users and protecting them from the monitoring spies making their activity much less visible by changing the way they connect to a torrent and setting a method to establish dynamic blocklists and maintain them.

##Abstract

Previous research has focused mainly on discovering monitors using trackers, this study focuses on tracking and blocking the monitors using the bittorrent peers and content discovery system only (called the DHT).

The global result is that the spies are organized to monitor automatically whatever exists in the bittorrent network, they are easy to find but difficult to follow since they might change their IP addresses and are polluting the DHT.

While spreading in the bittorrent network a torrent that does not exist, we show that the spies are organized in several levels in order to position themselves according to what is learned from the network and attract the users to connect to some of them (the final spies) so the users can start the bittorrent handshake containing the torrent that they are requesting which is deemed by the final spies to be the start of the download.

The final spies, which have solid IP addresses owned by the monitoring companies, are the dangerous ones but are not as numerous as one might think, often envisioning enormous blocklists to catch them.

We demonstrate that what the spies are doing is legally highly questionable, while this remains in most cases not enough to prove that the users did download the related torrent, therefore the validity of some take down notices is questionable too.

Static blocklists are not enough and the method shows how to create and maintain dynamic blocklists.

Finally, we show that a change in the bittorrent protocol is already specified to defeat most of the spies but is not implemented, or partially but is not in force, and we suggest some changes in the bittorrent clients to protect the users' privacy as well as allowing a fair relationship between them and the copyright holders, method that does not disturb the bittorrent network and could easily be implemented by any bittorrent client.

##Legal mention

This study was performed as a research work, not taking position, while it focuses on protecting the privacy of the users, as well as detecting and blocking the monitors, it provides some thoughts how the copyright holders could take benefit of the P2P network, envisioning for example some means for the users to pay something, which does not exist at all today, preventing people from legally using the magic of the bittorrent network, and envisioning the development of a new bittorrent client following the recommendations of this study protecting much more the users and fair for all parties.

The rationale for performing this study is explained at the end in order to solve an unfair situation in the context of Peersm project [13].

IP adresses of bittorrent users can not be hidden and can be easily seen in any bittorrent client, we do not disclose any of the IP addresses encountered during this study, whether for peers or monitors, the data used do not present any individual privacy issues since it was never analyzed on a case per case basis but in mass for statistical computation, the data will be destroyed when they are no longer required.

This study partially covers the specific case of monitors behaving quasi normally in a torrent, which to be fully studied would require us to participate to the torrent, therefore we have not participated in any copyright-infringing activity neither downloaded any file during this study.

##Background - Quick reminder about the bittorrent peer and content discovery system

The peers and content discovery system is the Distributed Hash Table (DHT). Each peer has a nodeID, each content has a reference called the infohash, a mathematical calculation (xor) gives a distance between them. Each peer maintains a routing table of the peers it knows, it first registers the closest peers to its nodeID by asking recursively others (find_node requests) starting with some well known bootstrap nodes and then registers the peers it encounters during its lifetime, the routing table is splited into 160 buckets corresponding to a distance range from its nodeID, each new peer is registered in the corresponding bucket according to its distance with the nodeID.

When a peer wants to download or announce a content it looks recursively for the closest nodes to the content's infohash by sending get_peers requests, and then a subsequent announce_peer request to the closest nodes, the second message must contain the token returned by the first get_peers request and the IP address of the querying party must be the same for both messages, this mechanism makes difficult for someone to announce something for somebody else.

The peers are answering to get_peers request with values (peers that did announce having the requested infohash) and/or nodes (peers that are known by the queried party to be close to the infohash).

The ensemble of all peers participating to a given torrent is called a "swarm".

##Related work

The trackers which are servers registering the peers and referencing the contents are out of the scope of this study, there are a lot of research papers about monitoring the bittorrent network ([3] and subsequent references) using trackers mainly, trackers are now obsolete and should not be used.

The above references sometimes mention the DHT but to a certain extent, some work exists about monitoring the spies ([3],[2] and subsequent references), but generally the topics are more about monitoring the users rather than monitoring the spies, we are not aware of studies related to detecting, tracking and following the spies using the DHT only.

We decided to explore all possible ways the spies have to monitor the bittorrent network using the DHT only, the first part is more empirical in order to understand the general behavior of the spies and to collect data, the second part is studying more precisely the DHT distribution and finalizes the method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment