ariard/refactor-rescan.md Secret

## refactor-rescan.md

      
    Raw
  

              refactor-rescan.md
            
          
    Rescanning : how to efficiently serve multiple chain clients in parallel ?

Currently, by default if a wallet is fallen-behind from Chainstate tip, it will ask Chain interface to send back blocks to verify the confirmation state of its transactions. This behavior is really likely to occur after every node shutdown for half an hour. Rescan logic may also occur when a privkey, pubkey or address is devoid of timestamp and you need to rescan from 0 to be sure you don't miss any transactions.
Rescan logic is heavily relying on short lived LOCK(cs_main) for knowing the current height, guessing scanning progress, fetching the block, deciding to stop the rescan. Moving locks inside the Chain interface, doesn't solve the problem and if this API is more exposed in the future, you just make an easy way to bother Chainstate operations. It's inefficient, specially in case of multiple wallets where rescans are going to be in concurrency with each other.
Ideally, you want to serve multiple clients in parallel without locking the chain.
Implement a ChainServer as both an asynchronous block cache/demultiplexer for multiple ChainClient

We craft a new data structure, ChainServer, with the following members :

std::list<ChainClient> ChainClients, a list of clients to serve
std::unordered_map<uint256, CBlock> CachedBlocks a tree of recent blocks
ThreadPoolChainServer control, a control structure to coordinate clients service

ChainServer is implementing the CValidationInterface and is the solely receiver of BlockConnected/BlockDisconnected events.
When a new block is received, it's cached in CachedBlocks. In case of reorgs, we keep track of forks in cache too.
At ChainClient initialization, it registers beside to ChainServer using a reworked Chain::handleNotifications call and passing its m_last_block_processed. They may also pass a unix timestamp with conversion to height being done on the node side.
Our worker threads, read the lowest tip of one of ChainClients and start to serve block to it, by reading the cache or ReadBlockFromDisk if needed.
The thread may yield after X blocks servicing to avoid long starvation on the ChainClient side, and one being a Tip - 10, being blocked by others being at height 0 and waiting for them reaching Tip forever.
If thread detects ChainClient Tip being on a fork, it should roll back reorged blocks until common ancestor and connect forward the block until tip. If hash of reorged block isn't in CachedBlocks, thread should return an error code to let wallet flush confirmation state of all its transaction then restart rescan from 0.
If ChainClient is a new blank one, it should be pass a SPECIAL_VALUE to do a "fast-forward" and avoid replaying old blocks for a wallet which doesn't care because it has no transactions.
When a privkey, pubkey or address is imported, RPC call should trigger a new Chain::handleNotifications call with height of timestamp is provided or zero if not. A further optimization would be to pass rescan range but needs better height tracking in descriptors or address format, so that's outside of scope.
Opens questions


what's the best size of block cache and being sure there is no gap with blocks being on disk ?
what's the best size of threadpool, are people using 10 wallets same time rn ?
how to turn block cache as a headers chain to increase is size by order of magnitude and avoid cache miss ?
should we spawn/assign a thread by client or any worker can server any clients ?
should we be able to split locks as Read for Worker and Write for BlockConnected on the CachedBlocks to allow concurrent read ?

A rough schema

             interfaces::Chain
                                                           ________
   _________        |  broadcastTransaction(Tx)           |        |
  |         |-------------------------------------------->| ATMP() |
  | Wallet3 |       |  TransactionAddedToMempool()        |________|
  |_________|<-----------------------------------------------------------------------------------\
                    |                                                                             \
                                                                                                   \
                    |                                ChainServer                                    \
                                                    __________________________________________       \
                    |                              |   ________________________               |       \
                                                   |  |                        |              |        \
                    |                              |  | std::list<ChainClient> |              |         \
    _________                                      |  |________________________|              |          \
   |         | RegisterClientWithTip(hashBlock)    |   _____________________________________  |           \
   | Wallet2 |------------------------------------>|  |                                     | |            \
   |_________|      |                              |  | std::unordered_map<uint256, CBlock> | |<---------- Scheduler
                                                   |  |_____________________________________| |
                    |                              |   _______________________                |
    _________                                      |  |                       |               |
   |         |     BlockConnected/Disconnected     |  | ThreadPoolChainServer |               |
   | Wallet1 |<------------------------------------|  |_______________________|               |
   |_________|      |                              |__________________________________________|

                    |