In Dune, the Fremen have an ambitious 350-year long program to terraform a desert planet, driven by large-scale analysis and long-term planning. Re-building the internet in a decentralised way is similarly ambitious, but currently we have many independent efforts lacking wider co-ordination. We argue for more focus on defining standards and interfaces between components, rather than disparate teams each building a large monolithic system that solves multiple technical problems, all in different ways that are unadaptable between each other. We give some design advice based on our experience with older systems. We propose some precise definitions of the problems we face, as well as specifications for abstract interfaces that address these problems. Finally, we imagine how some existing systems can be split into separate components that correspond to these interfaces.
- not "special talk", but software design talk, architecture of existing/future solutions
- not "solving new problems", but talking about design/engineering issue of previous/existing systems
- not a security analysis, some things may be more vague than you'd like. sorry.
- assumes surface knowledge of some common systems (Tor, DHTs, git, PGP, OTR), assumes you understand SWE principles
- fairly wide-ranging but brief run-through of many core concepts in YBTI
- meant to increase breadth of your knowledge, mile-high overview
history of freenet
- (it's cool, I worked on this and contributed to the mess)
- legacy java code, addresses many different concerns inside one program
- applications are tied to freenet ecosystem and its specific performance caveats, hard to adapt elsewhere
- we need YBTI platforms before we can build more complex applications e.g. photo sharing
- a little more usability-relaxed - you are coding for the technical developer, not the non-technical end-user
- but usability points apply to both cases, reduce repetitive work for your clients. the 3rd party app dev should not need to know your implementation.
- e.g. other talks, crypto alg details is not necessary
- basic concepts must be easy to describe. helps extensibility and adoption.
- Tor is relatively simple, cf. Freenet which is constantly changing, unclear separation
- complex interfaces can only be used by geniuses, and if they are actually a genius they would not want to use it
- your 3rd-party client/higher-layer developers are your most important users
- get the concepts into the cultural background, like P != NP. (poll audience)
- modularity, reduce effort, reduce duplication, all the standard SWEng advice
- someone else has probably done it better, take advantage of this
- as free software projects, we are short on resources anyways. don't let it go to waste.
"split into modules" is not enough
- interface must be simple, without complex caveats about various performance tradeoffs/compromises between different types of resources
- performance guarantees are part of your interface
- crypto implementation algorithm names are not part of your interface
- e.g. error correction should not be an application-level concern!
- e.g. TCP's interface is "reliable in-order delivery of a stream of data", for 20 years "good enough"
- newer improvements (largely) affect only this layer, don't require higher layers to be redesigned
- Tor pluggable transports interface complex (it's cool I work on this)
- e.g. http://www.certificate-transparency.org/comparison
- e.g. https://leap.se/en/docs/tech/infosec
interface vs implementation
- not just "list of methods", but entire contract, and performance bounds.
- quantity is quality! ?linus git quote "performance changes your workflow structure"
- not just "methods" (input) for higher layers to call that return values, but can also be spontaneous events (output) to provide to higher layers
- e.g "recv event, from F, payload D"
- talk about engineering practises in later section
- also includes security/performance properties ("guarantees")
contractual properties: security, performance
- auth/conf of data, forward secrecy, key validation
- learn from existing solutions, OTR good example, Pond slightly more complex example
- protection of metadata! anonymity and deniability
- given a target agent, who they talk to, when they are active on the service, how many contacts they have
- given a target resource, who reads/writes it, when it was last operated on, how many times it was operated on
- latency, scalability, availiability
types of resource
- communication (bandwidth)
- fully-homomorphic encryption brings promises for massive co-operative global delegated computation, far in the future
- but actually, many current applications need only private/restricted access, computation can be done on the trusted client
- notably, not global keyword search! shakes fist at google
- not YBTI-specific but necessary for it
- requires some reading and time/effort to do correctly, but long-term pay-off
- do everything asynchronously, don't block. Java APIs are notorious for this (freenet). blocking will destroy performance.
- mutex/condvars are the asm of concurrency, avoid
- Python Twisted, Haskell, Go, Scala (bugs mean this doesn't work properly)
- chaining Futures/Deferred, events register/fire/handle
- for NAT traversal, use ICE (rfc5245, rfc6544; you can see e.g. how WebRTC uses it)
- existing solutions are hard, but solves 95% of problems that are talked about
- please, please, do not "roll your own", as much fun as that might be
- perhaps make a wrapper lib. ICE interface is fairly complex to use.
- Tor's Pluggable Transports system for transforming network-level streams
- constant timing/packet-size profiles, leak no info in the channel, defends vs probe attacks
- doesn't yet support packet-based transport (as input interface), but we could extend it
- prefer abstract security primitives - be algorithm-neutral if feasible, they are broken all the time
- always use authencryption, never encryption by itself (susceptible to CCA, see coursera crypto I for def)
- key validation and MITM protection is a logistical problem, additional non-crypto concerns
- current methods: SMP interactive, offline fingerprint verification
- if the API needs you to "supply the IV" or "initialise state", it is an unclean API. ask a cryptographer how to use it properly.
- minimise responsibility of keys, generate a new one for each use eg. don't mix identity certification with message encryption
- signatures of data blobs are useless for more complex applications than "i built this file", must have associate represented semantics and sign that as well
- attacks on unsigned metadata
- provides trail-of-evidence to 3rd-party verifier. various schemes for deniability, hard/complex.
physical software process vs virtual social identity
- for security, for mobility, for convience, to transfer identity between trusted devices
- it's why everyone has such a hard-on for "cloud computing"; match this benefit or die
- (skip: inducing social graph over physical devices => be cautious TODO EXPAND)
- req-rep (low-latency, requires high-availability), pub-sub (high-latency, requires storage, notifications), dumb storage
- construct req-rep out of pub-sub - waiting
- construct pub-sub out of dumb storage - polling
- notifications bring security concerns too, extra surface area for interactive probing attacks
- req further research
- def: numeric identifier -> physical location or contact the resource
- (greedy routing, recursive vs iterative routing, path folding)
- DHT, Freenet, gnutella
- source-routed is much more secure, conflicts with other designs
- global (numeric address is locator) vs local (address+context is locator)
- context might be - identifer or locator of the local/personal provider
- (cf allianceP2P? null net vs private nets)
- identity via crypto keys
- immutable public, write access based on sigs (!ctxt-repudiability)
- group ACLs, not yet well-explored
- constructions, implementing one thing in terms of another thing
- SSKs out of a generic DHT (USKs? log-USKs?)
- mutable data is hard (global updates)! but can emulate with history graph c.f. git, mpOTR ideas
- easier to give ACLs based on this as well
- "revoke read access" means "revoke future read access"
- camlistore, tahoe-lafs, freenet
- new model; need to describe "best practises", cf. earlier "existing interfaces" point
- exception since it gives us benefits
- def: (existing network interfaces, seeds|prev state) -> (set of peers + locs)
- traditionally, "trust a set of roots", e.g. Tor directories, DHT bootstrapping
- not well-researched. if public, hard to achieve both availiability and security
- def: name (usu. memorable) -> identifier (or set of identifier candidates, harder for applications to use)
- zooko's triangle, classic
- triple-combo or double-combo schemes, as a simple-to-implement more-complex-to-use solution
- subjective vs objective (semi-subjective / social)
- natual/pure vs symbolic/impure
social graphs (maybe skip)
- relying on short-hop social connections is not globally-scalable, outside of local community (cf PGP WoT)
- just copying "real social networks" does not necessarily result in global decentralised system
- will result in informal hierarchies like in RL, without some other mitigation mechanism
- trust algs powerful for detecting network attacks, sybil attack, eclipse attack, maybe globally-scalable
- (visualiser, selector GUIs)
- individual loss of privacy
- zk-operations? maybe best of both worlds
- OTR (perhaps key validation can have a separate API)
- tor (separate transport good, maybe future separate discovery mechanism)
- tahoe-lafs (supports different storages)
- camlistore (arguably storage/nameres can be split out)
- i2p (could split DHT out) (stream lib on top of packet, good)
- freenet: network protocol (transport, steg/obfuscation), LRU-DHT, FEC, crypto-ACL, identity and trust, spam protection, messaging/content-sharing apps on top
- but freenet does split social identity from node operation, yay! (transferral is slightly annoying though, booo)
Comparison and examples
Distributed identity config service "example"
- our client can access "our preferences" from anywhere in the globe
- decentralised storage, either self-hosted, or public system
Which interfaces we need?
- storage, content-addressed (maybe requires discovery, if global)
- maybe requires special transport, if censored. NAT traversal
- crypto-ACLs to protect your own data