The politeiawww database is a key-value storage which is used for persisting user's data. The current implementation was designed having the LevelDB as the primary implemenation. Because of the need of scaling WWW, this issue was created and that has been worked on since then.
A feel things have been discussed and defined outside of this issue content:
- Sensitive data SHALL be encrypted at rest and in flight.
- Users which were before mapped by email, will now be mapped by their IDs.
- Every database record must have a field indicating the record type and the database version.
This is just some context. The goal of this document is to discuss a particular problem: How the migration tools shoud be designed in order to be able to migrate from the old database implementation to the new one while being generic/agnostic enough.
Both implementations store 3 types of records:
- User (the user data, many records)
- Version (the database version, single record)
- LastPaywallAddressIndex (the index to be used while deriving the user paywalll address, single record)
Old DB shape:
User | Version | LastPaywallAddressIndex | |
---|---|---|---|
Key | user email | "userversion" | "lastpaywallindex" |
Type | User struct | uint32 | uint64 |
New DB shape:
User | Version | LastPaywallAddressIndex | |
---|---|---|---|
Key | user ID | "userversion" | "lastpaywallindex" |
Type | User struct | Version Struct | LastPaywallAddressIndex struct |
All information stored by the new DB implementation is Encrypted before saving and Decrypted before retrieving.
What changed:
- User records are stored by ID and not longer by email.
- The database version is wrapped within a struct and is not longer a uint32.
- The last paywall addr. index is wrapped within a struct and is not longer a uint64.
This proposed solution should be implemented within the politeiawww_dbutil
package and contemplates the additions of two
new commands: dump and import.
-
Add a configuration flag to the database allowing to specify to either encrypt or not data. This can help dealing with the fact that the old DB is not encrypted, thus we need a way to prevent the database of trying to decrypt it.
-
The dump command: This command will dump the entire database into a .json file in the specified path.
Arguments:
- OutputDir (string) The directory where the dump file will be created. (default is the www home dir)
- Decrypt (bool) Either to decrypt or not the data while retrieving it. (default is true)
Example:
politeiawww_dbutil dump ~/.politeiawww/dump false
-
The import command: This command will import a .json file and rebuild the entire database based on that.
Arguments:
- DumpFile (string) The file to be imported
- Encrypt (bool) Either to encrypt or not the data before saving into the database (default is true)
- FromUserKey (string {email, ID)) Which key it holding the user information in the imported file. (default is ID)
- Migrate (bool) If true, the db version stored in the imported file will be ignored and all the records will be migrated to complie with the latest version.
- InferPaywallIndexFromUsers (bool) If true, the LastPaywallAddressIndex stored in the imported file will be ignored and the new value will be calculated by looking up over the users records.
Why not just have a version table in the database that contains the current database version instead of adding the version to every record?
Why is LastPaywallAddressIndex broken out into it's own column and not just included in the user data blob?
Couldn't this be detected automatically? If it's using LevelDB, don't decrypt it. If it's using CockroachDB, decrypt it.