Skip to content

Instantly share code, notes, and snippets.

@sijie
Last active April 19, 2017 21:08
Show Gist options
  • Save sijie/d7a242eb7d185e182b9a72c63534830b to your computer and use it in GitHub Desktop.
Save sijie/d7a242eb7d185e182b9a72c63534830b to your computer and use it in GitHub Desktop.
Proposal: Co-develop ManagedLedger and DistributedLog

Motivation

Both Yahoo Pulsar and Apache DistributedLog are built over Apache BookKeeper. They have different focuses but also share a lot of similarities on design principles and implementation details.

Pulsar is a full fledged pub/sub messaging system that provides very flexible messaging model, while DistriubtedLog focuses more on buidling a replicated log store that offers replicated log as a storage primitive that other applications/systems can use. In theory, Pulsar can use DistributedLog to build its messaging system.

Internally, Pulsar built a library called 'ManagedLedger' for interacting with Apache BookKeeper. ManagedLedger shares a lot of similarities on implementions with DistributedLog. They are described as below:

ManagedLedger DistributedLog
Read/Write Semantic Single writer, Single reader Single writer, Multiple readers
Tailing Read Semantic No tailing read semantic Support tailing read. Applications don't have to close a log to read data.
Layout A ManagedLedger is comprised of a list of Ledgers. A Log is comprised of a list of Segments. Segment is the storage abstraction of a Ledger.
Cursor A ManagedLedger also maintains a list of Cursors. Each Cursor represents the consume point of a consumer. The updates of a Cursor are stored in a Ledger. DistributedLog doesn't maintain any Cursors.
Data Retention The data written before the Cursors can be deleted or expired after the configured Time. The data can be deleted by explicitly truncation or expired after a configured Time.

This proposal is to propose merging ManagedLedger into DistributedLog and co-develop the Replicated Log library for common usage.

Public Interface

The proposed interface for merging ManagedLedger and DistributedLog will be comprised of two parts, one is Log interface, while the other one is Cursor interface.

Log

The Log interface will be based on the DistributedLog Log interface.

It includes following operations:

  • create log
  • delete log
  • open a writer to write records to the log
  • open a reader to read records from the log

Writer

  • be able to append the record to the log synchronously or asynchronously
  • be able to truncate the log based either explicitly or based on a configure time period.

Reader

  • be able to read from a provided position in the log.

Cursor

The Cursor interface will be based on the ManagedCursor interface. A cursor is indicating the position of a reader that is reading from the log.

A Cursor is a reader with position/offset tracking. Several operations are supposed in the cursor:

  • be able to read next records after this cursor
  • be able to seek and rewind the cursor
  • be able to mark deletion on a cursor

Data Retention

The default Log retention (without any cursors) policies will be still same with DistributedLog:

  • Explicit Truncation
  • Time-Based Expiration

The cursor management will truncate the Log use Explicit Truncation to satisify the cursor based retention.

Proposed Changes

The proposed changes will be:

  1. Import ManagedLedger code in DistributedLog

  2. Add Cursor interface the existing DistributedLog Library and improve the Log interface. The new set of API should support both ML and DLog feature sets.

  3. Include current 2 different implementation of the Log and Cursor interface. One is current ML implementation, while the other one is current DLog library implementation.

  4. Release Dlog with the new Cursor and Log API.

  5. Pulsar will use the DLog API and existing ML implementation.

  6. Eventually merging these two implementation towards one implementation and provides a seamless upgrade for both implemention users.

  7. Pulsar can then leverage the futures like tailing reads to support read-only brokers, live topic migration and such.

Compatibility, Deprecation, and Migration Plan

  • Till step 5, there is no real migration.
  • At step 6, a backward compatible upgrade will be applied.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment