Seed migration feature specification
Ultimate goal: allow people to change their home pod, solve the issue #908.
This document is intended to help forming and consent on the optimal plan of implementation of the demanded feature of the seed migration in diaspora. This requires both changes in the federation and in the internals of the diaspora pod implementation. And the document describes that changes.
Any other implementations of the federation-compatible software are out of the scope. However, where possible and sensible (i.e. some general federation descriptions), the description is made in a general wording, irrelevant of the specific implementation. It is supposed that by removing any diaspora-specific data from this document it provides a good basic description of how to implement compatible feature in another social networking software. However this is not the main purpose of this document, so there might be some shortages in this respect.
The Federation Is a multitude of the software instances which may be based on diverse code bases and communicate with each other using the federation protocol.
The Federation protocol Presently, the federation protocol is loosely documented. But at the same time there is a single implementation which represents a vast majority of the federation - it is the diaspora social media platform. Before there is available any official specification of the federation protocol, "the federation protocol" means the protocol, compatible with the implementation in the latest version of diaspora federation protocol implementation spare the features of the diaspora protocol which are declared deprecated. After the official documentation for the federation protocol is available, "the federation protocol" will mean the protocol, defined by that document.
User (aka seed, aka user's identity) User is a subject in the federation. At every moment user's identity is characterised by some state, which includes the user's profile and user's share/follow connections. Also user's identity includes some historical user's activity, like status update messages, comments, private messages, photos, etc. Full user identity is available only on home pod.
Account ID Each user is assigned with an account ID, unique across the federation. Account ID is the main way to address users in the federation. The form of the account ID must conform to the validation rule.
For each account ID in the federation there is an RSA keypair which is used to sign and encrypt messages at the federation protocol level. Account ID MUST NOT be reused with a different keypair and the keypair MUST NOT be reused with a different account ID.
Pod Pod is a server software installation which talks the federation protocol and which may host many users. Pods are responsible of keeping users' identities, delivering and receiving updates of data in the federation. Though parts of a user's identity may be hosted on different pods across the federation, there is a single pod where the full and up-to-date identity of a user is stored. This pod is called the home pod.
Account owner Normally there is an owner of a user's identity on the federation and it is usually a physical person. Owner as opposed to anybody else is allowed by the pod software to control the identity.
Seed migration (aka account migration, account backup/restore) The procedure of changing the home pod of the user while preserving the user's identitiy unmodified both from the user's point of view and from any other point of the federation.
Preserving user's identity unmodified also means that the account owner doesn't change.
Changing the home pod includes:
Changing the account ID and reissuing the key pair.
Moving all the user's data to a new home pod.
Ensuring the federation accepts the change in a consistent way.
The federation message There are two ways to post a message to a federation -- public and private. The difference is that public is sent signed but unencrypted and private is sent signed and encrypted. If private message has multiple recipients, a copy of message is sent for each of recipient encrypted with corresponding keys. In the message there is a payload which is called a federation entity. The federation entity is normally serialized as an XML object.
Relayable Relayables are objects which can be transferred by your contact to a 3rd party (his contacts). Relayables include comments, likes, participations and poll participations.
3. User data archive
The home pod is the only place where full user identity data is stored, including status messages, conversations, etc. In order to make the user identity moveable there must be a way to move the respective data. Pods must provide user data export/import features. This ensures that the user has the access to the same data he had before the migration.
Data archive format and contents are described in the section below.
Archive is a JSON document compressed via GZIP. Uncompressed JSON data must conform to the schema.
User's data archive contains the "historical" part of user's identity. The user whose identity is described by the archive data is the archive owner.
Owner's data in the archive
- Private key
- Aspects list
- Followed tags list
- Own relayables
Though included in the archive, own relayables for the private posts of other people are ignored on import, since either the pod knows the post and therefore knows the relayable, or it doesn't know the post and have no way to fetch it.
Other people's data in the archive
The proposal to include comments in the archive passed.
Also there is a possible situation when the new home pod doesn't know people who the user has in contacts. It's possible to discover a person, but sometimes it is impossible to perform it immediately (e.g. remote pod is temporary offline). In this situation the import process may get very complicated, because we can't import some data before we have their references in the DB. The same stands for the remote public posts for which we have some own relayables. Therefore I propose to include some metadata of each remote entry (person or post) to the archive enough to create referenced object in the DB, so that import process may pass flawlessly and then schedule discovery/fetch of the full data then.
Total the other people data in the archive:
- Relayables of other people for the user's posts
- GUID and public key of each referenced person in the archive
- GUID, author's account ID and post type for each referenced remote post
Ways to get user's archive to the new home pod
Currently, two ways are planned:
- Manual import of the previously exported archive
- Automatic backup method. This was originally covered in the discontinued document. The implementation of this method is postponed to the point after the manual export is implemented.
- Conforms to the schema
- The archive owner is either known locally or discoverable
- The archive owner's account is not closed
- The archive owner's known public key corresponds to the private key provided in the archive
4. Federation protocol extension
A user's identity is distributed accross the pods in the federation and account ID is what is used to address a user accross the federation. In order to change the home pod the account ID must be changed, because the home pod's address is a part of the account ID. To change the account ID while preserving identity, each link and mention of the identity in the federation must by updated to a new account ID. This functionality requires extension of the federation protocol.
Generally, the account ID change feature may be useful also outside of the seed migration procedure. Account ID change may be valuable in some other usecases, for example when user's private key has leaked and there is a need to reissue the key. Also, the pod domain change becomes possible by invoking the account ID change procedure for every user on a pod.
In order to inform the federation about the fact that some user has changed their ID a new federation entity must be introduced.
Here is example of the new federation message:
<account_rename> <author>firstname.lastname@example.org</author> <person> <guid>a4b1f51067440134e2380c89a50bb9e1</guid> <author>person-1-ca8d47@localhost:3000</author> <url>http://localhost:3000/</url> <profile> <author>person-1-ca8d47@localhost:3000</author> <first_name>my name</first_name> <last_name/> <image_url>/assets/user/default.png</image_url> <image_url_medium>/assets/user/default.png</image_url_medium> <image_url_small>/assets/user/default.png</image_url_small> <birthday>1988-07-15</birthday> <gender>Male</gender> <bio>some text about me</bio> <location>github</location> <searchable>true</searchable> <nsfw>false</nsfw> <tag_string>#i #love #tags</tag_string> </profile> <exported_key>-----BEGIN PUBLIC KEY----- MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDUrYGleaZozLnxF59dDjmQ92te t+uhr1nHNSrm0+yah8nREPEbpDaTn5J2vE6Tfu4YpgRcBCFvhmI7cMtn90GcFaV6 hr0vs1N3f0IntHwp4zKjpS7eA0h4XycPjwjAS5j7xTXWzS+PHF8BhoZ3B5TK9r4V ntbwrbJYGDM1dDrn3wIDAQAB -----END PUBLIC KEY----- </exported_key> </person> </account_rename>
TODO: it may make sense to add XML fields descriptions and/or schema, but I think XML is pretty self descriptive here, so it's minor.
In the DSL used in the diaspora_federation gem the definition of the entity looks so:
class AccountRename < Entity # the old account id property :author # the new person with new account id and new key entity :person, Person end
Recepients of the AccountRename message
Within the ID change procudure every link to the user's identity must be updated with the new ID. How to find every link? The set of the recipients equal to those who receive the profile updates is the most logical choice. However, there is a known issue with the diaspora federation implementation, which makes the list of profile updates recipients quite incomplete. Without this issue fixed, some pods who have links to the user's identity will miss the AccountRename message and therefore the federation would be less consistent.
The old home pod must be found among the recipients as well.
What if someone still addresses the old ID?
In any case, it's impossible to exclude the situation when someone address a user by an old id after the id change. For example, it can happen when someone got the user's ID outside the federation (e.g. from a business card) and make search with this ID. Therefore, we must take proper measures in case that happens.
Someone tries to discover a user by the old ID
Normally, the discovery goes in following steps:
/.well-known/host-metaon the pod and getting WebFinger address of the pod.
Quering WebFinger document from the pod:
Quering HCard from the pod.
If someone performs discovery using the old ID, the WebFinger response (step 2) from the pod should either reply with redirect (301 Moved Permanently) to the WebFinger address of the new home pod (to
https://email@example.com) or to return the WebFinger document with the updated information on the home pod of the user. In both cases the discovery routine will stop the process, because the initial ID of the user doesn't match the ID that was retrieved (see the code). When it happens, the side that does the discovery should fetch the AccountRename entity from the home pod of the user. Both old and new home pods hold enough information to return the proper AccountRename entity. In order to make the fetch process more robust, I suggest to attempt fetching from the old home pod first, and then if it fails, attempt fetching from the new home pod.
The WebFinger specification allows to redirect only to HTTPS resources (see the spec to avoid MITM. So the redirect feature won't work if the new home pod isn't available via HTTPS. In that case the discovery routine fails and discovery by the old ID is not possible (user is considered not found).
Someone sends a federation message to a user by the old ID
If some pod has missed the AccountRename message for some reason, it will send the federation messages to the old address. If that happens, the old home pod must send the AccountRename entity to the pod from where the message has arrived.
I would also suggest relaying of the message, received by the old home pod to the new home pod, but I'm not sure that the additional stability would worth the possible complexity of the implementation.
5. Seed migration flow
An account owner goes to his old home pod and exports his profile to an archive.
The account owner goes to
/users/importroute on the new home pod.
Migration wizard landing page is loaded.
The account owner enters all the data required to initiate migration process. See the "UI for the account migration" section for details.
When all the data is entered and verified for validness, the new home pod displays the message: "Your account migration has been scheduled. You’ll be emailed once your account is available".
The new home pod registers a new account for the account owner. The new account is in locked stated.
The new home pod updates references in the local pod DB from the old account to the new one, setup redirects, etc. This step is pod-implementation specific. The purpose of this step is to ensure the old profile of the user is rendered to all users of the pod as moved, and that authorship of content by old user is rendered as of the new user from now on. For diaspora details of that step see the "Updating DB state of a diaspora* pod when performing ID change" section.
User's archive import is performed, data from the archive is loaded to the database of the new home pod linked to the new user. See "Archive import sequence" for details.
Issue AccountRename entity to the federation. Every recipient pod (including old home) of this message must perform ID change procedure in accordance with "Updating DB state of a diaspora* pod when performing ID change". Approach to specifying the exact recipients is raised in the "Recepients of the AccountRename message" section.
Unlock the new user's account.
Email the user about the end of the import process (possibly mentioning whether some remote data has failed to merge).
6. Changes to be introduced to diaspora*
Database schema changes
We must store AccountRename objects in the database to track every event of ID change.
class CreateAccountRenames < ActiveRecord::Migration def change create_table :account_renames do |t| t.integer :old_person_id, null: false t.integer :new_person_id, null: false t.text :old_private_key t.timestamps null: false end add_foreign_key :account_renames, :people, column: :old_person_id add_foreign_key :account_renames, :people, column: :new_person_id end end
The main reason to keep the AccountRename object in the database is to route properly users and pods which try to address a the moved user using the old account ID. If the object is deleted from the DB, then people who search for user by the old ID will see the user as deleted rather than migrated. That could be acceptable behavior in some cases, however, generally, we should store the AccountRename objects permanently.
Updating DB state of a diaspora* pod when performing ID change
Archive import sequence
Import user settings, user profile details and tag followings
Create empty user's aspects according to the list in the archive
Import posts from the archive and create aspect visibilities for the imported private posts
From contacts and relayable authors pick those who are unknown to the pod and attempt a discovery. Put those for whom the discovery has failed to the list of failed entries.
Attempt fetch of parent posts of our own relayables. Again add failed to the list of failed entries.
Create stubs for failed entries using the metadata from the archive.
This is how unknown person may be created before it is actually discovered:
Person.create(diaspora_handle: "firstname.lastname@example.org", guid: "THEGUIDOFTHEUSER", pod_id: Pod.find_or_create_by(url: "https://remote.example.com").id, serialized_public_key: "...")
Post.create(guid: "THEGUIDOFTHEPOST", author_id: Person.find_by(diaspora_handle: "email@example.com").id, type: "Post")
Schedule discovery/fetch retry for the entries from the list of failed. Retry intervals are 10 minutes, 60 minutes, 1 day and 3 days (4 retry attempts totally).
Import contacts from the archive.
Import relayables from the archive.
UI for the account migration
Account migration feature is represented by wizard. Wizard is activated by following a specific server route
/users/import. There must be no user logged in with the web application to launch the wizard. If there is, the page should be rendered with the message clarifying that fact and the "Sign out" button.
For the manual archive import case the first step of the wizard is the upload of the user's archive.
On the second step, the wizard shows, whether the archive has passed the validation and if so, it queries for:
- User name part of the new ID. The name equal to the old one is proposed if available.
- User email. The email from the archive is proposed.
- New user password and confirmation.
After user has submitted the correct form, wizard shows the message "Your account migration has been scheduled. You'll be emailed once your account is available".
7. Old pods and implementation plan
We can't enable migration feature before the federation accepts required changes as stable and all maintained software in the federation supports the required protocol changes. I propose the following plan in order to make the feature introduction more reliable.
Implement and merge the ID change feature support for pods (WIP).
Update the archive format so users may save importable profiles as early as possible
Implement and merge the archive import and the UI changes, disable in default installations
Enable by default and anounce availability after all maintained software in the federation moves to the versions with the ID change support
Test coverage must include:
- Unit tests for all the new components introduced
- Integration tests for the ID change feature. 6783 as a preparation
- Verify there is no data left referenced to the old user and old person
- ID change on the new home pod
- ID change on the old home pod
- ID change within the same pod
- ID change for a non-home pod
- The case when pod missed the AccountRename message
- Our pod discovers an old ID
- An old ID is discovered from our pod
- A federation message came for the old ID
- Integration tests for the archive import feature
- Prepare "fixture" archives and expect specific data to appear in the DB after import
- Incostistent and invalid archives
- Also test cases when referred data (people) miss on the pod and discovery/fetch fails
- UI tests
- Behavior tests with a group of pods to control the federation consistency using the diaspora_federation_behavior framework.
Behavior tests scenario
Given 4 pods. Pods are populated with the data produced by fixtures builder (modified with 6783).
In 6783 we have user
carol who is a friend of
carol@pod1 shares with
bob@pod3. Here some data are exchanged betwen
bob@pod3 so that there are posts (public and limited), posts with mentions, comments, polls, private conversations between them.
pod4 prefetches the profile of
carol@pod1 but doesn't add her to contacts.
carol@pod1 wants to migrate to
carol@pod2 is occupied,
carol picks name
carol@pod1 exports her archive and imports in on
pod2. She receives an email about the finish of the migration process.
After the migration has finished it is expected that there are no
carol@pod1 in contacts of
bob@pod3, but there is
christy@pod2 instead. Also it is expected that all the known data of
carol@pod1 is now referenced on pods with the
It is expected, that previous participants of the discussions (posts&comments) may see them and continue discussions, and that messages will be federated properly.
pod4 is expected to miss the AccountRename message. It tries to send a message to
carol@pod1. It is expected that
pod4 receives AccountRename message after that. It is expected that a user from
pod4 may join some discussion at the public post of
christy@pod2 and his comment is federated properly. It is expected that a user from
pod4 may join some discussion where
christy@pod2 is a participant and
christy@pod2 receives proper updates (comments).
Additional consistency check: comparative DB verification
(more like an idea, nothing was done on this yet; dunno if worth implementing, but by my feeling may help catching some data regressions in future) Additional checks may be performed by analyzing two databases for the same objects. The same objects in the database must have the same data. For example a post with given GUID must have same text, author, etc. If database has the same people, it should check that the same deletion and migration objects present there. Etc.