Skip to content

Instantly share code, notes, and snippets.

@schmittlauch
Last active September 13, 2019 22:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save schmittlauch/1b27de0a6effbefd7225df7b76470271 to your computer and use it in GitHub Desktop.
Save schmittlauch/1b27de0a6effbefd7225df7b76470271 to your computer and use it in GitHub Desktop.
discussion of proposal on gossiping global hashtags through the fediverse

discussion of proposal on gossiping global hashtags through the fediverse

commnets for this proposal by @cjd@mastodon.social This was an interesting, although sometimes a bit tough to understand proposal. The order of sections and introductions could be improved.
While I found this different approach interesting, I have some issues and questions. Please excuse my misunderstandings, there are likely a bunch of them.

general

  • For many usecases, posts have to disseminate near-realtime through the fediverse. Can this system provide that?
    • this probably depends on the periodicity of publishing the nextBundle
    • very likely still much slower than my DHT-based approach
    • how long do new or rarely used hashtags to disseminate through the system?
  • Do instances have to parse all hashtags of a bundle -- and maybe even store and forward them?
    • event bundles of an instance contain all hashtags known by an instance + lists of their posts
    • If instances store and forward all the hashtags they received: assuming a size of 1KiB per post ID (which is rather high), extrapolating the Twitter dump used in my paper yields 8.58 TiB of post IDs to be stored for a full year when storing post IDs for all hashtags. This would have to be done for each instance in our instanceMap!
    • If instances just store and forward hashtags they're interested in: This very likely hurts the dissemination of unpopular hashtags and makes it hard for new hashtags (no interest at all in the beginning) to be even forwarded by anyone except the originating instance

If an instance is unable to store events going far enough back to maintain working federation, other instances will switch to pulling those events from someone else.

How to find out from whom to pull instead? Try out all other known instances in order of their hmetrics?

  • If an intermediate instance has to drop stored items
  • For being able to get a list of posts for a single hashtag, we have to download and combine multiple bundles from various instances -> inefficient. Also, when do we know that we've combined enough of these bundles to get the most complete view possible?
    • this issue appears as we basically have no load balancing except "instances can drop history if it's too much for them"

security

  • Malicious instances may lie about their metric to become the preferred instances to fetch from, thanks to a low hmetric
  • You said your ideas were roughly inspired by BGP. Unfortunately, BGP is broken as well:
    • BGP being broken is just not that apparent as fewer parties have access to the respective routing points between ASes, and those have a Gentlemans Agreement. But BGP routing attacks are quite common, usually noticed when they fail and the complete Youtube traffic is routed to a single poor box of an Pakistanian ISP.
    • There's no real basis for trusting even correctly signed bundles by known instances, as in an open system instances are discovered through boosts, mentions, or explicit user subscriptions
    • only reason for trusting instances: when using an explicit allow list. But all of these instance of that list need to take the same approach as well => nation-stateification of the fediverse, which produces clustering and makes it hard for new instances to join at all
  • one advantage of your proposal: It is harder to deny the existence of posts, as they can be gossiped via multiple paths. But it is still not possible to proof the completeness of received posts, and more entities can forge posts.

Summary

  • interesting approach, might be feasible (although I'm sceptic about dissemination times)
  • but from a security perspective no real advantages:
    • DHT security might indeed be hard, but you might be even too afraid of it
    • only advantage: multiple paths for receiving posts of a hashtag, but similar might be achievable with the DHT approach with the redundancy scheme and verifiable, mergeable post histories
@cjdelisle
Copy link

Thank you for giving this post such deep consideration !
For those who want to review the original document, you can find it here: Fediverse Global Hashtags.

I'll answer your comments as they come:

  • depends on the periodicity of publishing the nextBundle - I would say that the upper bound is "secondly", past that and you start to put too much load on the smaller nodes who are validating n signatures per period. The reason why I prefer HTTP over other protocols such as the bitcoin TCP protocol is because with HTTP/2 and WebSockets it is often just as fast and it is far more flexible (you can just throw an nginx instance in the middle and everything just works)
  • how long do new or rarely used hashtags to disseminate through the system? - So a really important thing to understand is that everybody is signature-checking every event bundle so you can't modify them in any way - this means a new hashtag will travel just as fast as an update of an existing one. I don't have any data about how fast this stuff will propagate but I'm thinking 1 second per hop is an upper bound on latency and 10 hops is far longer than we should expect in the foreseeable future.
  • 8.58 TiB of post IDs to be stored for a full year when storing post IDs for all hashtags - Yes, but I think storing old data should be difficult. If someone is intentionally running an archive, they have the harddrives for storing the history of the hashtags, the content and even the media, but they're also somewhat accountable and will normally answer a GDPR request. If we start to put all of history into DHTs and datashards, we need to understand that what we're building is essentially a Big Brother which will remember everything about us and tell anyone who asks, I don't think this is really where we want to go.
  • If instances just store and forward hashtags they're interested in - as addressed above, you can't change a bundle without breaking the signatures
  • How to find out from whom to pull instead? Try out all other known instances in order of their hmetrics? - correct
  • have to download and combine multiple bundles from various instances - yes, you have to parse all of them into your local database to make use of them, not as efficient as a DHT lookup indeed but I don't think it's onerous
  • this issue appears as we basically have no load balancing - We could imagine very light instances using a hashtag search service to avoid the need to participate at all, but generally the only place where we can load balance is in instances setting a higher metric to prevent others from asking them for event bundles
  • Security
    • Malicious instances may lie about their metric - Metric is the instance's choice, setting it to zero means "anyone can download from me, I have plenty of bandwidth".
    • BGP is broken as well - This is a popular meme in the security news, but as someone who has spent the past 7 years working on Something Better™, I have slowly adopted more and more of BGP features into my project because it turns out to be a lot less broken than people claim.
      • BGP routing attacks are quite common - BGP has an excellent filtering system, but unfortunately it is being expected to perform a superhuman task:
        1. There is no authenticated identities, anyone can announce an IP address if their upstream doesn't filter their announcement
        2. Any given ISP might have valid reason to announce any of it's own prefixes but also any of it's customers' prefixes, so making a decent filter is extremely hard and so it's sadly common for ISPs to import *
      • My proposal only leaves the filtering mechanism to filter abuse, there is no question about the ownership of identities as there is with BGP so that impossible expectation is not placed upon it.
      • There's no real basis for trusting even correctly signed bundles by known instances - Again, event bundles are forwarded separately without breaking signatures, so the signature is on events which were created only on the instance who signed it.
      • only reason for trusting instances: when using an explicit allow list. - Basically the only attacks we're preventing with filters are sybil attacks via creation of millions of false instances which cannot effectively be blocked using the normal method (who then go on to create fake trends and spam hashtags). That and allowing whitelists for instances for elementary schools which are only federating with other elementary schools. My expectation is that the filtering feature will essentially never be actually used, it's existence will merely deter attacks.
    • But it is still not possible to proof the completeness of received posts, and more entities can forge posts. - As stated before, event bundles cannot be tampered with without breaking the signature of the originating instance.

Let me know if you have any other questions or thoughts, also we could switch to another medium such as google docs which has comments if you would prefer it.

Thanks,
Caleb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment