If you want to make a federated Twitter-side service, I think you should spend more than a couple of minutes on the design.
I scanned the "Pub Sub Hub Bub" spec, and as far as I can tell, it requires a publisher to store a key for each subscriber. When they post a new tweet, their home server needs to do a separate HTTP POST to each subscriber, HMAC-signed by the subscriber's key. The content (a few sentences at most) needs to be composed into an atom feed -- in XML! -- as if it were a blog post.
Imagine that mastodon grew to over a dozen servers, and BBC News joined, and only a few thousand users subscribed. Every time they post new cricket scores, a few thousand individual HTTP POSTs (signed separately) go out to the same dozen servers.
The HTTP POST has no fallback, so if it fails, I guess you just don't get the tweet?
Here's an alternative design, which I also only spent a couple of minutes on (though I admit I have the advantage of having built one of these before):
-
Make a user's feed be a list of short messages, each with an ID. The IDs should be assigned with a timestamp as the first element, so they sort naturally by time. Make a fetch call that pages through them by those IDs.
-
Just use JSON. Make the content just the ID, a text field, and maybe a list of attached urls (pictures, whatever). Don't get fancy. You can add more metadata later if anyone cares.
-
Don't worry about private feeds. Someone else can design that by encrypting the message contents or using keybase.
-
The server should remember individual subscriptions but aggregate them upstream like greader. The server can subscribe to the BBC News feed, and deliver it to all its local members -- so the BBC server only has to post once to each other server.
-
Instead of PSHB, use JSON, and include the ID of the previous message so that a server can tell if it missed a notification. And make servers poll subscriptions periodically just in case.
Sorry if this all seems obvious. Maybe it's good to write it down anyway?
The mastodon scaling article contains a lot of rudimentary stuff like using background jobs and queues, and maybe not running those on the same machine as your web frontend, or trying to stuff tweets in a relational database. I assume the author is just very green and there's no fault in being green, but there isn't a lot of meat in there for anyone experienced. Having a pipeline for tweet delivery is well documented, for example, and achieving 50 requests/second isn't great.