Requirements for room event storage:
- Lossless storage of the event JSON as the signatures need to remain valid (this is most easily done by simply storing the serialized JSON in a binary column)
- Able to work out the state of the room at any given event.
- Able to traverse the rooms event in topological order (for federation).
- Able to fetch new events for users (including events in rooms that the user is in and membership changes in any room, e.g. invites)
- Able to fetch the most recent N events in each room that the user is in (for initial syncs). In the future initial syncs will most likely be paginated and so the server will need to be able to fetch the most recent recent N events in the most recent M rooms.
- Able to fetch events older than a given point in a room (for back pagination).
A major complication is that depending on the state in the room not all users will be allowed to see all events, for example if the history visibility is set to "join". Whether or not a user can see a given event depends (almost) solely on the state at the event and thus can be computed eagerly only once, rather than every time the event is to be sent to the user.
The most obvious way of doing this is to keep a separate table storing which users are allowed to see which events in the room and update this when persisting new events. This allows more easily constructing queries for fetching "the most recent N events" that take account of filtering.
Redactions also affect whether users can see an event. However, unlike above this is not a per user property and so can be stored as a simple flag in a column of the events table.
One of the main hot paths is when clients request new events. This requires:
- checking each room they are in for updates
- checking if their membership state has changed in any rooms (e.g. leaves, invites, etc.)
The way that synapse does this is to keep caches of:
- the current rooms the user is in
- for each room the last time a change occurred
- for each user the last time their membership changed in any room
This allows synapse to only fetch things from the database when its reasonably certain something has actually changed in a room. The down side of this is that it involves a query per room that has changed (as opposed to potentially a single query).
An alternative is to store per user lists for the events that need to be sent to clients, and update this when an event is sent. (Although not spec'ed yet, the intention is for homeservers to be able to inform clients that they need to re-sync, which would allow limiting the size of these lists)
For federation, its required that the server can return the full state at any event in the room graph.
Synapse does this by defining the notion of state groups. Each event in a room that shares the same state is assigned the same state group, allowing the server to store the mappings
event id -> state group and
state group -> state, which is more space efficient than storing the full state for each event.
However, in practice this still takes up a lot of space. Its particularly a problem in large rooms where users frequently join and leave, as the state would be copied on each membership change (and the state would be large). A mitigation for this is to store deltas of state for each state group and compute the full state when required. In practice, while its required that a server can calculate the full state at any given event, this happens rarely enough that optimizing for space rather than time or CPU usage is acceptable.
The room graph gives only a partial ordering of events, and the server may receive events over federation out of order. This proves problematic for syncing events to the clients, since they expect an ordered stream of events.
/sync API currently does not support out of order events, but its likely in the future that the events sent down sync will include some ordering information that allows clients (who wish to do so) to reorder events correctly.
Currently, synapse uses two different orderings:
- stream ordering - a monotonically incrementing integer that represents when the server received that event, and is used to keep track of which events have been sent to a client. This is a global ordering.
- topological ordering - an ordering that first takes into account the topological ordering and then the stream ordering. This is a per room ordering, and is primarily used for back pagination in rooms as it guarantees events are given in an order that respects causality.