A COLLECTION is created in a CLUSTER (of BROKERS) with a defined and fixed number of potential PARTITIONS. For example: 12 data log files (AKA PARTITIONS), each with 256 VIRTUAL-PARTITIONS within; totalling 3072 potential PARTITIONS.
ERA is an incrementing and synchronous COLLECTION of the PARTITIONS. The starting ERA is 1, and the next will be 2; and so on.
Each BROKER is a member of a CLUSTER; whenever a BROKER joins or leaves a CLUSTER a new ERA is declared across all BROKERS within the CLUSTER. The old ERA’s PARTITIONS are closed from writes and new PARTITIONS are created for writing.
A new COLLECTION is created in a CLUSTER containing one BROKER. 12 PARTITIONS are created by this BROKER labeling them as PARTITION-1[1:256] to 12[1:256] in ERA-1. 1 to 12 represents the PART while [1:256] represents the VPART.
A second BROKER joins this CLUSTER, now there are two.
A new ERA is declared; BROKER-1 closes the PARTITIONS for ERA-1 and creates 6 new PARTITIONS labeling them as PARTITION-1[1:256] to 6[1:256] in ERA-2. BROKER-2 also creates 6 new PARTITIONS labeling them as PARTITION-7[1:256] to 12[1:256] in ERA-2.
Optionally a background thread can move BROKER-1 PARTITION-7[1:256] to 12[1:256] in ERA-1 over to BROKER-2; creating balance of all ERAS.
If one of these BROKERS leaves the CLUSTER the actions are reversed while naming PARTITIONS ERA-3.
A MASTER is used to coordinate assignment of PARTITIONS per BROKER for each ERA.
A PRODUCER creates a new RECORD, this contains a KEY and a VALUE being the payload of the RECORD. The RECORD is sent to a BROKER with an assigned PARTITION to be written into. The Hash Partitioning Function in sudo-code:
HashU32 = hashFunction(RECORD.KEY).be_u32
PART = HashU32 % 12
VPART = hashFunction(RECORD.KEY)[0]
PARTITION = {PART, VPART}
- 12 being the canonical number of partitions in the first ERA.
COLLECTION A logical topic or table of related records.
RECORD A series of bytes, defined as data representing an event; containing a KEY and a VALUE.
BROKER A server writing and reading RECORDS into a COLLECTION; serving these to PRODUCERS and CONSUMERS.
MASTER A service coordinating even PARTITION distribution.
ERA An incrementing logic of time.
PRODUCER A client application creating RECORDS, which are written into PARTITIONS by BROKERS.
A PARTITION can be split beyond the canonical number started with. For example, if starting with 12 whole PARTITIONS - then each of the 12 can be divided by the VPARTS within; say: 256 / 2 = 128, now 24 PARTITIONS exist as: {PARTITION-1 = PART-1, VPART-[1:128]}, {PARTITION-2 = PART-1, VPART-[129:256]}, {PARTITION-3 = PART-2, VPART-[1:128]}, {PARTITION-4 = PART-2, VPART-[129:256]}, etc.