Skip to content

Instantly share code, notes, and snippets.

@jdarcy
jdarcy / activitypub.md
Created November 9, 2022 16:10
Some thoughts about ActivityPub

I've commented a few times about some issues I see with the scalability of ActivityPub - the protocol behind the Fediverse and its best-known implementation Mastodon. A couple of folks have asked for more elaboration, so ... here it is.

First, let me add some disclaimers and warnings. I haven't devoted a lot of time to looking at ActivityPub, so there might be some things I've misunderstood about it. On the other hand, I've brought bigger systems - similar node counts and orders of magnitude more activity per node - from broken to working well based on less study of the protocols involved. So if you want to correct particular misconceptions, that's great. Thank you in advance. If you want to turn this into an appeal to authority and say that I'm wrong only because I haven't developed a full ActivityPub implementation or worked on it for X years ... GTFO.

What

What is ActivityPub? It's an HTTP- and JSON-based protocol for exchanging information about "activities". An activity could be many things.

#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <urcu/uatomic.h>
#define NFILES 20000 /* total, not per volume */
@jdarcy
jdarcy / gist:5625eabaef00f8771de5588bef5ca857
Created November 17, 2016 21:25
Git branch-combining script
#!/bin/bash
log_unique () {
git log --reverse --oneline $1 --not master
}
for branch in $*; do
log_unique $branch | while read hash summary; do
echo "=== $hash ($branch) $summary"
git cherry-pick $hash
@jdarcy
jdarcy / Makefile
Created September 27, 2016 12:46
My favorite makefile
me_a_sandwich:
@echo "Poof! You're a sandwich."
@jdarcy
jdarcy / gist:984bfea7dd7489ac06973e704affe294
Created September 1, 2016 13:06
Making curl|bash suck a tiny bit less
The "curl|bash" way of installing software is insecure due to a complete lack of authentication and integrity checking, but it's popular because it's easy for both users and developers. Using HTTPS protects against MITM attacks, but not against attacks on the origin server. The install script can be replaced and nothing will keep users from executing the possibly malicious replacement. What's needed is some sort of signing mechanism, using keys not on the same server as the install script itself. How can we implement this without sacrificing ease of use? How about this? Let's start with something that's easy to explain.
$ curl $URL | validate $URL | bash
Yes, this makes the pipeline 50% longer, but bear with me. What does "validate" do? First, it passes the URL to a secure authentication service, which maps that URL to a GPG key and returns that key to the user. There might be many such services, with varying levels of security and selectable by the user. Once they key is obtained, "validate" s
@jdarcy
jdarcy / gist:9bfc817568211872ce28a6ea62f44dcb
Created August 31, 2016 12:35
Stray job-killing script
#!/bin/bash
sleep 10 &
sleep 10 &
for j in $(jobs | sed -n '/^\[\([0-9]\)\].*/s//\1/p'); do
kill %$j
wait %$j
done
@jdarcy
jdarcy / gist:ac65ab1121ff8efcf7a4f99cbc80fa92
Created August 18, 2016 01:42
Go home, bash. You're drunk.
#!/bin/bash
# This will cause bash to blow its stack and dump core, because the whole trap
# handler is executed in the context where we've been redirected to the bogus
# pipe, generating another SIGPIPE, invoking the handler again, etc.
trap "echo SIGPIPE" PIPE
# This is even sillier. We're defining this trap handler outside of any loop,
# but it only has meaning within a loop. If we're in a loop when the signal
# occurs, we'll break out. If we're not in a loop, we'll get a *syntax error*.
@jdarcy
jdarcy / gist:d68a334471db8ace748b
Created January 15, 2015 16:11
Tiering Notes

Tiering Data Structures and Algorithms

Problem Statement

In the design of a tiering solution, the first problem one encounters is the definition of an ideal end condition. As a first approximation, consider this:

Sort all files by (descending) time of last access.  The hot tier should contain the files at the top of the list, up to capacity.  The cold tier should contain everything else.

This definition practically requires two kinds of records, because crawling even the top tier alone - let alone both tiers - to get the same information would be prohibitively expensive and so slow that the answers would be wrong by the time you got them.