Created
October 15, 2015 14:12
-
-
Save nagydani/b3ab04f9970952a51b0c to your computer and use it in GitHub Desktop.
Integration of ipfs and Ethereum
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In this document, I outline the tasks required for storing and | |
presenting the Ethereum block chain and Web-based Ethereum Đapps in | |
ipfs. Currently, ipfs is very good at locating and delivering content | |
using a global, consistent address space and it has a very well designed | |
and implemented http gateway. However, Ethereum's use cases require | |
additional capabilities that ipfs currently does not provide. | |
Redundancy and persistency | |
In both important use cases, we need to make sure content is available | |
under the condition that nodes can come and go. Ipfs, by itself, does | |
not provide any mechanism to ensure this, though there is a weak | |
incentive for replication built into their "bitswap" protocol, which | |
seems not to be implemented completely at this point, with important parts | |
of the design still not finalized. | |
Long-term persistency of meaningful pieces of information can be | |
incentivized by content availability insurance that is largely | |
independent of the underlying distributed storage solution. The most important | |
development in this regard is the Swarm Contract at | |
https://github.com/ethersphere/go-ethereum/blob/bzz-config/bzz/bzzcontract/swarm.sol | |
However, it is also worth noting that the entire infrastructure for | |
redundant and secure storage developed for Swarm can be used in the framework | |
of ipfs thanks to its pluggable hash function. If swarm hash is added | |
as an application-specific hash function to ipfs and swarm nodes advertize | |
their content in ipfs DHT, Swarm can serve as a replication infrastructure | |
to ipfs. | |
Fair allocation of bandwidth resources | |
Bitswap defines an API for bandwidth accounting that can be easily extended | |
to include micropayment transfers to balance otherwise unbalanced bandwidth | |
use between peers. | |
The vast majority of these micropayment transactions must happen off the | |
block chain, otherwise the use of the block chain itself becomes a significant | |
transaction cost. Such a micropayment mechanism has been developed for Swarm | |
and can be used as a plug-in for Bitswap as well as for a multitude of other | |
purposes not even related to storage. The relevant contract code and go API | |
are availabe at | |
https://github.com/ethersphere/go-ethereum/tree/bzz-config/common/chequebook | |
Names and URIs | |
One design principle of Swarm was to allow for arbitrary names and URIs to | |
resolve to both static and dynamic content served up by Swarm infrastructure. | |
Unfortunately, this has not been a design goal for ipfs and in its current form | |
it does not fulfill it. In particular, static directories with a large number | |
of entries are handled very inefficiently by ipfs and there is no obvious | |
way around this limitation. | |
In practice, it makes it very difficult to migrate content like | |
Wikipedia to our distributed storage, even though it would have one of | |
the obvious candidates for a high-profile applications of such an | |
infrastructure. Similarly problematic would be to implement commonly | |
used http API's for mapping content, such as OpenStreetMap tiles, on top | |
of ipfs, which would be another obvious candidate. | |
I believe that for the success of Web3, it is instrumental to retain as | |
much compatibility with popular and useful Web 2.0 standards and | |
services as possible. The URI resolution scheme used by ipfs constitutes | |
a very severe limitation hampering such efforts. | |
Decentralization | |
The design of ipfs provides a common abstraction for both centralized | |
and decentralized storage solutions so that content can be retrieved | |
from both using the same software; the consumer of the content does not | |
even need to be aware of the underlying storage architecture and ipfs | |
does not specify one. The content can come from a workstation with a | |
temporary adress, an individual small server, a large datacenter or a | |
sophisticated content delivery network. As long as the content conforms | |
to ipfs format and is advertized in ipfs DHT, the consumer will be able | |
to download it all the same. | |
Moreover, ipfs solves one of the main problems of the (http(s)-based) | |
web driving its rapid centralization, which is that the costs of content | |
distribution borne by the publisher increase with the content's | |
popularity. Since ipfs content is delivered bittorrent-style, all consumers | |
automatically contribute their upstream bandwidth towards distribution, at | |
least for the time of downloading, thus contributing their fair share. | |
However, as history with Bitcoin shows, enabling decentralization does | |
not prevent centralization. Economies of scale might result in a | |
centralization of storage infrastructure; the real question then becomes | |
to what extent can large players abuse their position. | |
Censorship resistance | |
In some ways, ipfs is explicitly censorship-enabling; nodes can decide | |
what content to store and not to store and they can credibly comply with | |
take-down notices. At the same time, ipfs also helps keeping content | |
available for all users as long as there are nodes that are willing to | |
serve it, although it must be noted that it also helps finding all | |
such nodes. This might be a workable compromise. | |
For this, however, to remain the case, it is important that the DHT | |
remains decentralized. Unfortunately, at present there are no incentives | |
built into ipfs for running DHT nodes. DHT nodes cannot be excluded for | |
not responding to queries, because ipfs DHT attaches very little value | |
to connections. Consumers are not punished for freeloading (only | |
querying other DHT nodes, but never responding to queries), while a | |
cartel providing most of the storage service might decide not to keep | |
outsider addresses in their Kademlia table and yet provide a pleasant | |
user experience to freeloading consumers. Over time, this might develop | |
into a problem. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@jbenet
Thank you for your extensive and very informative response. I belive that there are some misunderstandings between us and I would like to iron them out as quickly as we can.
I would be very happy to speak with you again and I am wondering if #ipfs IRC channel, which I used to ask questions about IPFS is not the best forum to get in touch with IPFS community.
I do not cite it as a deficiency, merely as an architectural feature that needs to be taken into account.
I understand that.
Perhaps I should have been more explicit about it, but I do understand it and I believe that I have even mentioned it. Anyway, thanks for making it even more clear here.
Thank you! Indeed, a virtualized IPFS node may solve some issues that we have.
Of course. The pluggable hash function is important only insofar as content is addressed by its hash value.
Correct. We will have to do that very soon. Would gladly receive any pointers to documentation or relevant interfaces.
That might also be an option.
That is actually a somewhat contentious issue as Ethereum has its own p2p system. But that is not a show-stopper either; we might use both.
Right. Is my characterization of IPFS correct here? Mind you that there is no implied criticism at all, we are ready to use that API.
I am talking about content addressed URIs where a root hash is followed by a path.
Sorry, I have looked at the code in the master branch and asked people on IRC. Let's discuss this. Is there a way to take control of parsing the URI following the root hash of content addressed URIs?
This is what I did. I believe that I do understand it, but will be happy to learn more. For blockchain data, we just need the pluggable hash function. However, for web-based dapps, something resembling a filesystem would be essential.
Congratulations! Could you post a link here? Are updates fast, too? I am truly curious about this.
That's great. I was told over IRC that the IPFS path resolver over merkle trees uses directories as nodes and that is what I have seen in the code as well.
In particular, as I understand, if you have a directory with 3 objects named, say AA, AB and BB, it will be one node inside the merkle tree with a three-way branching, rather than having a two-way branching separating BB from the other two beginning with A followed by another two-way branching for AA and AB (containing only A and B, of course).
Wonderful! Since this is my primary concern and I am the least sure about how IPFS actually does this, let us discuss it separately.
I do see this and me calling your approach a "workable compromise" is my endorsment of it.
Correct.
I have quite a bit of experience with such issues, both legal and technical. What you write here is mostly correct, except that you accuse me of criticizing your approach (I do not) or being intent on endangering people out of ignorance and arrogance (I am not).
I do not think you know what I think.
I was merely pointing out a potential problem, using highly conditional language ("over time", "might", etc.). Sure, I do not expect it to become a real problem anytime soon and you will have plenty of time to think about it and eventually do something about it. No urgency here, but I decided to make this concern explicit. Thank you for sharing your roadmap for a solution!
Indeed, I might not be understanding something here. I will need to learn more about this.
I think, you also misunderstood what I have written. 256 is the theoretical maximum for the degree of our merkle tree nodes (fanout). Directories can contain tens of millions of entries without any problems in Swarm.
Great.
I have seen it and the two are not exactly equivalent. The difference in browser behavior (between chaning only the fragment part vs. other parts of the URL) is subtle, but it is there.
I have asked on IRC and apparently got the wrong answer. But again, I will be happy to look deeper and ask again.
And how would allowing to split at any character, not just slashes, make things worse? Actually, I believe that you could make a fully backwards compatible change here that would greatly improve things. When I understand your codebase better, I will be even willing to submit a PR.
What I mean is a merkle dag over arbitrary portions of the URI, not necessarily directories. For an example, consider the AA, AB, BB case above.
Can fanout be changed without affecting anything else? If so, that would indeed solve my "large" problem.
I am very eager to see that.