Skip to content

Instantly share code, notes, and snippets.

@markcode
Last active June 29, 2020 23:55
Show Gist options
  • Save markcode/072b75f2c7f5d82624e0563dd59ac3e7 to your computer and use it in GitHub Desktop.
Save markcode/072b75f2c7f5d82624e0563dd59ac3e7 to your computer and use it in GitHub Desktop.
Scaling Event Logs and Hash Partitioning Function

Scaling Event Logs and Hash Partitioning Function

Setup

A COLLECTION is created in a CLUSTER (of BROKERS) with a defined and fixed number of potential PARTITIONS. For example: 12 data log files (AKA PARTITIONS), each with 256 VIRTUAL-PARTITIONS within; totalling 3072 potential PARTITIONS.

ERA is an incrementing and synchronous COLLECTION of the PARTITIONS. The starting ERA is 1, and the next will be 2; and so on.

Each BROKER is a member of a CLUSTER; whenever a BROKER joins or leaves a CLUSTER a new ERA is declared across all BROKERS within the CLUSTER. The old ERA’s PARTITIONS are closed from writes and new PARTITIONS are created for writing.

For example

A new COLLECTION is created in a CLUSTER containing one BROKER. 12 PARTITIONS are created by this BROKER labeling them as PARTITION-1[1:256] to 12[1:256] in ERA-1. 1 to 12 represents the PART while [1:256] represents the VPART.

A second BROKER joins this CLUSTER, now there are two.

A new ERA is declared; BROKER-1 closes the PARTITIONS for ERA-1 and creates 6 new PARTITIONS labeling them as PARTITION-1[1:256] to 6[1:256] in ERA-2. BROKER-2 also creates 6 new PARTITIONS labeling them as PARTITION-7[1:256] to 12[1:256] in ERA-2.

Optionally a background thread can move BROKER-1 PARTITION-7[1:256] to 12[1:256] in ERA-1 over to BROKER-2; creating balance of all ERAS.

If one of these BROKERS leaves the CLUSTER the actions are reversed while naming PARTITIONS ERA-3.

A MASTER is used to coordinate assignment of PARTITIONS per BROKER for each ERA.

Hash Partitioning Function

A PRODUCER creates a new RECORD, this contains a KEY and a VALUE being the payload of the RECORD. The RECORD is sent to a BROKER with an assigned PARTITION to be written into. The Hash Partitioning Function in sudo-code:

  HashU32 = hashFunction(RECORD.KEY).be_u32 
  PART = HashU32 % 12 
  VPART = hashFunction(RECORD.KEY)[0] 
  PARTITION = {PART, VPART} 
  • 12 being the canonical number of partitions in the first ERA.

Definitions

COLLECTION A logical topic or table of related records.

RECORD A series of bytes, defined as data representing an event; containing a KEY and a VALUE.

BROKER A server writing and reading RECORDS into a COLLECTION; serving these to PRODUCERS and CONSUMERS.

MASTER A service coordinating even PARTITION distribution.

ERA An incrementing logic of time.

PRODUCER A client application creating RECORDS, which are written into PARTITIONS by BROKERS.

@markcode
Copy link
Author

markcode commented Jun 6, 2020

This model allows the new broker joining a cluster the ability to immediately commence accepting new records for its assigned partitions.

@markcode
Copy link
Author

markcode commented Jun 6, 2020

If 12 is chosen as the canonical number of initial partitions, then; an even number of partitions can be distributed among: 1, 2, 3, 4, 6 and 12 brokers.

@markcode
Copy link
Author

A PARTITION can be split beyond the canonical number started with. For example, if starting with 12 whole PARTITIONS - then each of the 12 can be divided by the VPARTS within; say: 256 / 2 = 128, now 24 PARTITIONS exist as: {PARTITION-1 = PART-1, VPART-[1:128]}, {PARTITION-2 = PART-1, VPART-[129:256]}, {PARTITION-3 = PART-2, VPART-[1:128]}, {PARTITION-4 = PART-2, VPART-[129:256]}, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment