Skip to content

Instantly share code, notes, and snippets.

@ariard
Last active July 29, 2019 21:04
Show Gist options
  • Save ariard/89f9bcc3a7ab9576fc6d15d251032cfa to your computer and use it in GitHub Desktop.
Save ariard/89f9bcc3a7ab9576fc6d15d251032cfa to your computer and use it in GitHub Desktop.

Rescanning : how to efficiently serve multiple chain clients in parallel ?

Currently, by default if a wallet is fallen-behind from Chainstate tip, it will ask Chain interface to send back blocks to verify the confirmation state of its transactions. This behavior is really likely to occur after every node shutdown for half an hour. Rescan logic may also occur when a privkey, pubkey or address is devoid of timestamp and you need to rescan from 0 to be sure you don't miss any transactions.

Rescan logic is heavily relying on short lived LOCK(cs_main) for knowing the current height, guessing scanning progress, fetching the block, deciding to stop the rescan. Moving locks inside the Chain interface, doesn't solve the problem and if this API is more exposed in the future, you just make an easy way to bother Chainstate operations. It's inefficient, specially in case of multiple wallets where rescans are going to be in concurrency with each other.

Ideally, you want to serve multiple clients in parallel without locking the chain.

Implement a ChainServer as both an asynchronous block cache/demultiplexer for multiple ChainClient

We craft a new data structure, ChainServer, with the following members :

  • std::list<ChainClient> ChainClients, a list of clients to serve
  • std::unordered_map<uint256, CBlock> CachedBlocks a tree of recent blocks
  • ThreadPoolChainServer control, a control structure to coordinate clients service

ChainServer is implementing the CValidationInterface and is the solely receiver of BlockConnected/BlockDisconnected events.

When a new block is received, it's cached in CachedBlocks. In case of reorgs, we keep track of forks in cache too.

At ChainClient initialization, it registers beside to ChainServer using a reworked Chain::handleNotifications call and passing its m_last_block_processed. They may also pass a unix timestamp with conversion to height being done on the node side.

Our worker threads, read the lowest tip of one of ChainClients and start to serve block to it, by reading the cache or ReadBlockFromDisk if needed.

The thread may yield after X blocks servicing to avoid long starvation on the ChainClient side, and one being a Tip - 10, being blocked by others being at height 0 and waiting for them reaching Tip forever.

If thread detects ChainClient Tip being on a fork, it should roll back reorged blocks until common ancestor and connect forward the block until tip. If hash of reorged block isn't in CachedBlocks, thread should return an error code to let wallet flush confirmation state of all its transaction then restart rescan from 0.

If ChainClient is a new blank one, it should be pass a SPECIAL_VALUE to do a "fast-forward" and avoid replaying old blocks for a wallet which doesn't care because it has no transactions.

When a privkey, pubkey or address is imported, RPC call should trigger a new Chain::handleNotifications call with height of timestamp is provided or zero if not. A further optimization would be to pass rescan range but needs better height tracking in descriptors or address format, so that's outside of scope.

Opens questions

  • what's the best size of block cache and being sure there is no gap with blocks being on disk ?
  • what's the best size of threadpool, are people using 10 wallets same time rn ?
  • how to turn block cache as a headers chain to increase is size by order of magnitude and avoid cache miss ?
  • should we spawn/assign a thread by client or any worker can server any clients ?
  • should we be able to split locks as Read for Worker and Write for BlockConnected on the CachedBlocks to allow concurrent read ?

A rough schema

             interfaces::Chain
                                                           ________
   _________        |  broadcastTransaction(Tx)           |        |
  |         |-------------------------------------------->| ATMP() |
  | Wallet3 |       |  TransactionAddedToMempool()        |________|
  |_________|<-----------------------------------------------------------------------------------\
                    |                                                                             \
                                                                                                   \
                    |                                ChainServer                                    \
                                                    __________________________________________       \
                    |                              |   ________________________               |       \
                                                   |  |                        |              |        \
                    |                              |  | std::list<ChainClient> |              |         \
    _________                                      |  |________________________|              |          \
   |         | RegisterClientWithTip(hashBlock)    |   _____________________________________  |           \
   | Wallet2 |------------------------------------>|  |                                     | |            \
   |_________|      |                              |  | std::unordered_map<uint256, CBlock> | |<---------- Scheduler
                                                   |  |_____________________________________| |
                    |                              |   _______________________                |
    _________                                      |  |                       |               |
   |         |     BlockConnected/Disconnected     |  | ThreadPoolChainServer |               |
   | Wallet1 |<------------------------------------|  |_______________________|               |
   |_________|      |                              |__________________________________________|

                    |
@ryanofsky
Copy link

Lots of feedback. Small things first:

  • Ideally, whatever solution we come up for wallet rescans should be reuseable by indexes as well. Indexes have their own block reading loop in BaseIndex::ThreadSync which operates out of sequence with the wallets. It'd be nice if new chain rescanning code we write for wallets could be compatible with indexes as well. For example, it might be nice to send a more general Rewind(old_tip, new_tip) type of notification instead of just BlockDisconnected(block).

  • I think most of the places this proposal is referring to ChainClient interface, it should be probably referring to the Chain::Notifications interface and NotificationsHandlerImpl instead. ChainClient is to meant to be responsible for starting and stopping components during init and shutdown. It doesn't really have to do with notifications or chain state. (It would probably make sense to rename ChainClient to NodeClient or something like that.) There is also not necessarily a 1:1 correspondence between ChainClient and Chain::Notifications objects. For example, the multiwallet implementation in #10102 uses individual ChainNotification objects for each wallet but just one shared ChainClient object for all wallets right now.

  • The proposed ChainServer class seems redundant with the existing Chain class. I understand a two way split between Chain / Chain::Notifications for bidirectional communication, but not a three way split between Chain / ChainClient / ChainServer. The CachedBlocks map also seems duplicative with g_blockman.m_block_index and the ThreadPoolChainServer class may be adding functionality that more properly belongs in MainSignalsInstance (the class responsible for asynchronously sending notifications from a queue).

More significantly, I think this proposal could be clearer about the exact problems it is solving, and try to take an more incremental approach to solving them to avoid rewriting too much code at once.

It seems like there are two problems this proposal solves, and the first is clear and straightforward, while the second is more open ended and messy.

  1. First problem is that when bitcoin starts, there are multiple rescan loops reading the same scan ranges repeatedly out of order, instead of just once in order.
  2. The second problem is that there isn't parellization between wallets during BlockConnected / BlockDisconnected notifications, so wallets are blocked when they could be doing work.

I'd suggest solving problem (1) by itself first because it's kind of a prerequisite for solving the problem (2), and it is simpler.

My proposal would be to first write some rescan code that can take multiple CBlockLocator scan requests and consolidate them into a single scan that sends notifications to all the relevant handler. I wrote some pseudocode below that ignores locking, but there is very similar real world code already written in BaseIndex::ThreadSync.

namespace node {
class Rescan
{
    std::map<interfaces::Chain::Notifications*, CBlockIndex*> m_request_start;

public:
    Rescan() {}

    //! Add rescan request
    void addRequest(interfaces::Chain::Notifications& callback, const CBlockLocator& locator) {
        m_request_start[&callback] = FindForkInGlobalIndex(::ChainActive(), locator);
    }

    //! Drop rescan request
    void removeRequest(interfaces::Chain::Notifications& callback) {
        m_request_start.erase(&callback);
    }

    //! Read blocks in sequence, consolidating rescan requests and send notifications in sequence.
    void sendNotifications() {
         // This method should basically be an adapted version of the existing
         // BaseIndex::ThreadSync() function. Main difference is instead
         // m_request_start contains a list of blocks compared to
         // m_best_block_index which is a pointer to an existing block.

         while (!m_request_start.empty()) {
             // Loop for earliest start positions.
             CBlockLocator* min_start = nullptr;
             for (auto& [callback, start] : m_request_start) {
                CBlockLocator* fork = ChainActive().FindFork(start);
                if (fork != start) {
                    callback->Rewind(start, fork);
                    start = fork;
                }
                if (!min_start || min_start->nHeight > start->nHeight) {
                    min_start = start;
                }
            }

            // Read next block and send notifications.
            CBlockLocator* next = ChainActive().Next(min_start);
            CBlock* block = ReadBlockFromDisk(next);
            for (auto& [callback, start] : m_request_start) {
                if (start == min_start) {
                    if (next) {
                        callback->BlockConnected(block);
                        start = next;
                    } else {
                        // Atomically register callbacks when caught up to tip.
                        // This is racy, real code would look more like #15719.
                        RegisterValidationInterface(callback);
                        m_request_start.erase(callback);
                    }
                }
            }
        }
    }
};

I'd would put this in src/node/rescan.h / src/node/rescan.cpp and use it to replace wallet and BaseIndex rescan code. Steps to use this would be to:

  • Add a node::Rescan m_rescan member to ChainImpl
  • Add a CBlockLocator locator argument to the Chain::handleNotifications() method and have it call m_rescan.addRequest() instead of RegisterValidationInterface() like currently.
  • Delete the rescan code from CreateWalletFromFile and have it just pass its locator to the existing Chain::handleNotifications call there
  • Add Chain::startNotifications method which starts a worker thread if one isn't one currently running, and to call m_rescan.sendNotifications() from that thread.
  • Add Chain::startNotifications calls where appropriate at the end of AppInitMain, and in loadwallet/createwallet RPC calls.

This would take care of the repeated rescan problem (1) above.

Beyond that we could:

  • Generalize this code to handle rescan importmulti, rescanblockchain, etc wallet RPC requests that use start and end blocks instead of locators.
  • Be smarter about queuing and threading and allow notifications to be processed in parallel.
  • Delete rescan code from the BaseIndex and use Chain::handleNotifications instead there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment