achow101/wallet-storage.md Secret

## wallet-storage.md

      
    Raw
  

              wallet-storage.md
            
          
    Wallet Storage

Document how we currently do wallet storage and all of the objects and function calls. Then document how this can be abstracted to allow for other storage backends.
Current

Creating and loading a wallet goes through the same steps. Some things will just automatically pass if there's no wallet file.
Start with a WalletLocation which points to either a wallet file with a name, or a directory with the name. The directory may or may not exist. The directory may or may not contain a wallet.dat file. The name may be empty in which case the directory is the walletdir.
First call LoadWallet. This then calls CWallet::Verify.
CWallet::Verify checks whether, in order:

The path does not exist
The path is a directory
The path is a symlink to a directory
The path is a file

Then CWallet::Verify creates a WalletDatabase which creates a BerkeleyEnvironment with GetWalletEnv. GetWalletEnv calls SplitWalletPath which:

Checks whether the path is a file
a. If so, it returns env_directory as the directory containing the file and database_filename as the filename
Returns env_directory as the given path and database_filename as wallet.dat.

GetWalletEnv constructs a BerkeleyEnvironment for env_directory. This all goes into the WalletDatabase constructed by CWallet::Verify.
CWallet::Verify runs WalletBatch::VerifyEnvironment which passes through to BerkeleyBatch::VerifyEnvironment. VerifyEnvironment calls GetWalletEnv again but this just fetches the environment created earlier. It then does BerkeleyEnvironment::Open to open the database environment.
BerkeleyEnvironment::Open tries to create the env_directory. It then adds a .walletlock file to that directory if possible. It then creates a subdirectory env_directory/database. Finally it calls DbEnv::open (BDB's open function) to actually open the BDB environment.
If it fails to open, env_directory/database is renamed and the open is retried. I guess the process of opening the environment is enough verification, so BerkeleyBatch::VerifyEnvironment exits.
CWallet::Verify, after calling BerkeleyBatch::VerifyEnvironment, checks to see if -salvagewallet was requested. If so, it calls WalletBatch::Recover which passes through to BerkeleyBatch::Recover.
WalletBatch::Recover fetches the previously created BerkeleyEnvironment with GetWalletEnv. Then it renames the database file with DbEnv::dbrename so that it is moved to a backup location. Next it calls BerkeleyEnvironment::Salvage.
BerkeleyEnvironment::Salvage creates a new Db. With this, it uses Db::verify with the renamed database filename to verify the integrity of the database and dump the key-value pairs to a string. That string is parsed for each key-value pair and the vector of key-value pairs is returned.
WalletBatch::Recover creates a new Db at the original database filename. It goes through each key-value pair returned by BerkeleyEnvironment::Salvage and writes it into the new Db with Db::put. Once it is finished, WalletBatch::Recover exits.
Lastly, CWallet::Verify calls WalletBatch::VerifyDatabaseFile which passes through to BerkeleyBatch::VerifyDatabaseFile. This then gets the previously created BerkeleyEnvironment with GetWalletEnv. If there is a wallet file, BerkeleyEnvironment::Verify is called.
BerkeleyEnvironment::Verify creates a Db and does Db::verify just to chck the integrity of the wallet file. If it fails, WalletBatch::Recover is done to do a salvage wallet.
When CWallet::Verify passes, CWallet::CreateWalletFromFile is called to either initialize or load a wallet file.
The wallet file itself is created during the construction of the first BerkeleyBatch object (which probably happened during CWallet::Verify). BerkeleyBatch::BerkeleyBatch is called by WalletBatch::WalletBatch when it instantiates the WalletBatch::m_batch member variable.
BerkeleyBatch::BerkeleyBatch fetches the database filename from its BerkeleyDatabase. It fetches a cached Db from BerkeleyDatabase. If one does not exist, it creates on then calls Db::open to create the database file and open a sub-database named main.
If this database was newly created, a version record is added.
CWallet::CreateWalletFromFile creates a new CWallet instance with a construction that takes the chain, WalletLocation, and a WalletDatabase. The WalletDatabase is created using WalletDatabase::Create which was described previously as part of CWallet::Verify.
If -zapwallettxes, CWallet::ZapWalletTx is called. This calls WalletBatch::ZapWalletTx which calls WalletBatch::FindWalletTx. This gets a database cursofr from the BerkeleyBatch stored as m_batch. It iterates through the database with the cursor looking for tx records.
Any found tx records have their txid and CWalletTx added to some vectors which are returned by WalletBatch::FindWalletTx.
WalletBatch::ZapWalletTx iterates through every found tx record and calls WalletBatch::EraseTx. That calls BerkeleyBatch::Erase which calls Db::del to remove a record from the database file. At the end, the CWallet is closed and a new one opened.
After -zapwallettxes (or skipping it), CWallet::LoadWallet is called. This calls WalletBatch::LoadWallet. This first uses BerkeleyBatch::Read to read the minversion record and load that into the CWallet. Then it gets a cursor and calls ReadKeyValue on each record given by the cursor.
ReadKeyValue interprets the type of record given by the cursor and loads it in to the CWallet as appropriate or into WalletScanState.
After reading all the key-value pairs, WalletBatch::LoadWallet does some post processing on things aggregated by WalletScanState and loads the things that cannot be loaded by themselves.
Now wallet loading and creation is finished. The rest is just application level stuff.
Some other notes

WalletBatch always opens a Db when it is constructed. When it goes out of scoe or is Reset, it will flush and close that Db instance.
There's a periodic flush done by MaybeCompactWalletDB which is done by a scheduler every 500 milliseconds.
Moving Forward

This all needs to be cleaned up so we can more easily introduce new storage backends.
Instead of having 3 Berkeley* classes, we should only have one. A new interface DatabaseStorage can be defined. Then BerkeleyStorage can subclass it and be a replacement for BerkeleyEnvironment, BerkeleyDatabase, and BerkeleyBatch. We can keep WalletBatch as it's application level stuff.
DatabaseStorage classes will have Open, Close, Flush, Read, Write. Cursor*, Txn*, etc. functions. They will handle all of the database management and reading and writing to the database instead of having separate classes for each.
So CWallet::Verify can call Open to open the environment then open the database and do verifications. This should also no-op if no database file exists. The BDB specific environment and file verification should be moved into Open. Maybe CWallet::Verify could be removed entirely?
Then CWallet::CreateWalletFromFile can do Open with a create option to open a database and create it if it does not exist.
The result should be that all of the environment handling stuff is abstracted away and encapsulated solely within BerkeleyStorage. BerkeleyStorage will also have the peristent Db handle and have the Read, Write, etc. functions for that Db handle.
What Needs To Be Done


Move salvagewallet and maybe some other wallet recovery stuff to wallettool
Disallow having more than one database file open in an environment (this is only supported for backwards compatibility).
Introduce DatabaseStorage class
Create a BerkeleyStorage class which passes through things to BerkeleyEnvironment, BerkeleyDatabase and BerkeleyBatch. Then use DatabaseStorage everywhere instead of BerkeleyDatabase.
Squash down the three existing Berkeley* classes into BerkeleyStorage, i.e. move the code out of those classes into BerkeleyStorage and then delete them.