Skip to content

Instantly share code, notes, and snippets.

@fernandoabolafio
Last active February 8, 2019 13:21
Show Gist options
  • Save fernandoabolafio/b686e07bf98f581dcae77775ef4d73a8 to your computer and use it in GitHub Desktop.
Save fernandoabolafio/b686e07bf98f581dcae77775ef4d73a8 to your computer and use it in GitHub Desktop.

Design for politeiawww migration tools

Context:

The politeiawww database is a key-value storage which is used for persisting user's data. The current implementation was designed having the LevelDB as the primary implemenation. Because of the need of scaling WWW, this issue was created and that has been worked on since then.

A feel things have been discussed and defined outside of this issue content:

  • Sensitive data SHALL be encrypted at rest and in flight.
  • Users which were before mapped by email, will now be mapped by their IDs.
  • Every database record must have a field indicating the record type and the database version.

This is just some context. The goal of this document is to discuss a particular problem: How the migration tools shoud be designed in order to be able to migrate from the old database implementation to the new one while being generic/agnostic enough.

Old DB X New Db:

Both implementations store 3 types of records:

  • User (the user data, many records)
  • Version (the database version, single record)
  • LastPaywallAddressIndex (the index to be used while deriving the user paywalll address, single record)

Old DB shape:

User Version LastPaywallAddressIndex
Key user email "userversion" "lastpaywallindex"
Type User struct uint32 uint64

New DB shape:

User Version LastPaywallAddressIndex
Key user ID "userversion" "lastpaywallindex"
Type User struct Version Struct LastPaywallAddressIndex struct

All information stored by the new DB implementation is Encrypted before saving and Decrypted before retrieving.

What changed:

  • User records are stored by ID and not longer by email.
  • The database version is wrapped within a struct and is not longer a uint32.
  • The last paywall addr. index is wrapped within a struct and is not longer a uint64.

Proposed solution:

This proposed solution should be implemented within the politeiawww_dbutil package and contemplates the additions of two new commands: dump and import.

  1. Add a configuration flag to the database allowing to specify to either encrypt or not data. This can help dealing with the fact that the old DB is not encrypted, thus we need a way to prevent the database of trying to decrypt it.

  2. The dump command: This command will dump the entire database into a .json file in the specified path.

    Arguments:

    • OutputDir (string) The directory where the dump file will be created. (default is the www home dir)
    • Decrypt (bool) Either to decrypt or not the data while retrieving it. (default is true)

    Example: politeiawww_dbutil dump ~/.politeiawww/dump false

  3. The import command: This command will import a .json file and rebuild the entire database based on that.

    Arguments:

    • DumpFile (string) The file to be imported
    • Encrypt (bool) Either to encrypt or not the data before saving into the database (default is true)
    • FromUserKey (string {email, ID)) Which key it holding the user information in the imported file. (default is ID)
    • Migrate (bool) If true, the db version stored in the imported file will be ignored and all the records will be migrated to complie with the latest version.
    • InferPaywallIndexFromUsers (bool) If true, the LastPaywallAddressIndex stored in the imported file will be ignored and the new value will be calculated by looking up over the users records.
@lukebp
Copy link

lukebp commented Feb 8, 2019

Every database record must have a field indicating the record type and the database version.

Why not just have a version table in the database that contains the current database version instead of adding the version to every record?

LastPaywallAddressIndex (the index to be used while deriving the user paywalll address, single record)

Why is LastPaywallAddressIndex broken out into it's own column and not just included in the user data blob?

Add a configuration flag to the database allowing to specify to either encrypt or not data. This can help dealing with the fact that the old DB is not encrypted, thus we need a way to prevent the database of trying to decrypt it.

Couldn't this be detected automatically? If it's using LevelDB, don't decrypt it. If it's using CockroachDB, decrypt it.

@fernandoabolafio
Copy link
Author

fernandoabolafio commented Feb 8, 2019

That was a requirement from @moo31337. We want to be able to check that the record version does match the database version.

Why not just have a version table in the database that contains the current database version instead of adding the version to every record?

Each user has its own paywall address. The LastPaywallAddressIndex is used to generate the paywall address for the next user to be created in the database. Maybe is possible to use a random value instead of sequentially generate the paywall index? I don't know if there is any hidden assumption on the current approach.

Why is LastPaywallAddressIndex broken out into its own column and not just included in the user data blob?

The new implementation of leveldb also supports encryption. But now you mentioned it I am wondering if we need encryption for leveldb since it won't be distributed. I think so because even it is not distributed, the data shall rest encrypted in an untrusted machine.

Couldn't this be detected automatically? If it's using LevelDB, don't decrypt it. If it's using CockroachDB, decrypt it.

@lukebp
Copy link

lukebp commented Feb 8, 2019

Ignore my first two comments. I misunderstood what you were saying. I thought that you were saying the user, version, and LastPaywallAddressIndex were all part of the same record. They're actually three separate records. That makes much more sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment