Skip to content

Instantly share code, notes, and snippets.

@Renkai
Last active November 30, 2020 02:37
Show Gist options
  • Save Renkai/e5be927404fbfd8289e7703c55812b1c to your computer and use it in GitHub Desktop.
Save Renkai/e5be927404fbfd8289e7703c55812b1c to your computer and use it in GitHub Desktop.
PIP: Configurable data source priority for offloaded messages

Motivation

Currently, if the data in pulsar was offloaded to the second storage layer, data can still exists in bookkeeper for a period of time, but the client will directly read data from the second layer.

This may lead to several problems:

  • Read from second layer have different performance characteristics, which may lead wrong estimate from users if they didn't know which layer they are reading.
  • The second layer may be managed by another team rather than Pulsar management team(for example, a independent HDFS management team), they may have independent quota or authority policy to users.
  • The second layer storage can be infinite in theory, if user set cursor to an error time in accident, it will cause a lot of resource waste.

So it's better to make data source configurable if data exists in both layer.

Maybe the below options are enough:

  • BOOKKEEPER_ONLY
  • BOOKKEEPER_FIRST
  • OFFLOADED_ONLY
  • OFFLOADED_FIRST

Background

Now which layer was broker read from is decide by org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#getLedgerHandle(long ledgerId) which only have one parameter ledgerId , and will choose the offloaded ledger handle as soon as the ledger was offloaded. If the choosed handle fails all the getLedgerHandle fails.

Implementation

The tiered read priority should be set by namespace or topic, the method in command line tool should be looks like

pulsar-admin namespaces --set-tiered-read-priority tenant/namespace priority-policie

pulsar-admin topics --set-tiered-read-priority tenant/namespace/topic priority-policie

If not configured, OFFLOADED_FIRST should be used by default, which will result to the same behavior with current version.

Then the corresponding ManagedLedger should be aware what priority option client is using, and the signature the getLedgerHandle method should be change to

CompletableFuture<ReadHandle> getLedgerHandle(
long ledgerId, TieredReadPriority priority) {

For BOOKKEEPER_ONLY and OFFLOADED_ONLY, the ManagedLedger will use the corresponding ReadHandle directly. For BOOKKEEPER_FIRST and OFFLOADED_FIRST , ManagedLedger will fall back to the secondary storage, no matter the ledger in the first layer storage does not exist, or there is something wrong in network or disk or authorization with first layer storage.

@gaoran10
Copy link

  1. Maybe we could add the new configuration tieredReadPriority to the config file broker.conf or standalone.conf for broker level config.
  2. We could add the field tieredReadPriority to the LedgerInfo and the signature of the method getLedgerHandle could stay the current situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment