A COLLECTION is created in a CLUSTER (of BROKERS) with a defined and fixed number of potential PARTITIONS. For example: 12 data log files (AKA PARTITIONS), each with 256 VIRTUAL-PARTITIONS within; totalling 3072 potential PARTITIONS.
ERA is an incrementing and synchronous COLLECTION of the PARTITIONS. The starting ERA is 1, and the next will be 2; and so on.
Each BROKER is a member of a CLUSTER; whenever a BROKER joins or leaves a CLUSTER a new ERA is declared across all BROKERS within the CLUSTER. The old ERA’s PARTITIONS are closed from writes and new PARTITIONS are created for writing.
A new COLLECTION is created in a CLUSTER containing one BROKER. 12 PARTITIONS are created by this BROKER labeling them as PARTITION-1[1:256] to 12[1:256] in ERA-1. 1 to 12 represents the PART while [1:256] represents the VPART.
A second BROKER joins this CLUSTER, now there are two.
A new ERA is declared; BROKER-1 closes the PARTITIONS for ERA-1 and creates 6 new PARTITIONS labeling them as PARTITION-1[1:256] to 6[1:256] in ERA-2. BROKER-2 also creates 6 new PARTITIONS labeling them as PARTITION-7[1:256] to 12[1:256] in ERA-2.
Optionally a background thread can move BROKER-1 PARTITION-7[1:256] to 12[1:256] in ERA-1 over to BROKER-2; creating balance of all ERAS.
If one of these BROKERS leaves the CLUSTER the actions are reversed while naming PARTITIONS ERA-3.
A MASTER is used to coordinate assignment of PARTITIONS per BROKER for each ERA.
A PRODUCER creates a new RECORD, this contains a KEY and a VALUE being the payload of the RECORD. The RECORD is sent to a BROKER with an assigned PARTITION to be written into. The Hash Partitioning Function in sudo-code:
HashU32 = hashFunction(RECORD.KEY).be_u32
PART = HashU32 % 12
VPART = hashFunction(RECORD.KEY)[0]
PARTITION = {PART, VPART}
- 12 being the canonical number of partitions in the first ERA.
COLLECTION A logical topic or table of related records.
RECORD A series of bytes, defined as data representing an event; containing a KEY and a VALUE.
BROKER A server writing and reading RECORDS into a COLLECTION; serving these to PRODUCERS and CONSUMERS.
MASTER A service coordinating even PARTITION distribution.
ERA An incrementing logic of time.
PRODUCER A client application creating RECORDS, which are written into PARTITIONS by BROKERS.
If 12 is chosen as the canonical number of initial partitions, then; an even number of partitions can be distributed among: 1, 2, 3, 4, 6 and 12 brokers.