Skip to content

Instantly share code, notes, and snippets.

@iamaleksey
Last active December 17, 2015 21:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save iamaleksey/fa36552409dc2aa70bee to your computer and use it in GitHub Desktop.
Save iamaleksey/fa36552409dc2aa70bee to your computer and use it in GitHub Desktop.
Counters 2.0 draft
## writes
- each update is kept in a separate cell
- each update comes with an explicit client-provided timeuuid, i.e.
UPDATE counter = counter + 1 USING ID 528c58da-c51b-41b4-92a9-b4bce020d4f2 WHERE key = 'whatever'
- for one single version, allow omitting it to enable migration from counters1.0 (and generate the timeuuid server-side)
or maybe allow it indefinitely for those who don't care about retries
- the timeuuid is stored in the cell name, as a composite component (a-la ColumnToCollectionType), with DESC clustering order
(latest writes first)
- add a new cf-level setting: counter_write_window. If time(update-timeuuid) < now() - counter_write_window, the update is rejected
- other than this little staleness check, the counter write path becomes the same as for any regular write
## reads, without merges introduced (yet)
- issue a regular SliceFromReadCommand, sum the deltas, return the result to the client
## merges, general principles #1
- cells older than (now() - counter_write_window - some_safety_margin) can be safely merged together
- merging isn't a local problem unless RF=1
- can be performed on reads, async like read-repair, randomly or when certain conditions are met
- can be performed during compaction, in 1 or 2 steps
- requires quorum reads if we force quorum writes, or reads at CL.ALL if we don't
- the merged value must live in the same cf, for atomicity sake
we will need a way to differentiate between a regular cell and a merge cell (explained later)
so, we add anoter component to the composite - type (e.g. 'm' for merge, 'r' for regular)
this component follows the timeuuid, not the other way around
it is important that merge ('m') cells sort first, so 'm' and 'r' fit the criteria
## merge steps, when triggered during read at quorum-ish CL
- pick all the cells older than (now() - counter_write_window - some_safety_margin)
- of the filtered cells, pick and remember the largest uuid (the first one, given DESC clustering order)
- sum all the picked cells, then write a new one (with the summed value), using the saved uuid, but 'm' type instead of 'r'
- do not touch the now-obsolete merged values just yet
## reads, with merges introduced
- issue a regular SliceFromReadCommand
- on each node, read until we meet the first merge ('m') cell, ignore and don't return the rest (their sum is in this 'm' cell)
- on the coordinator, sum all the cells until (but including) the first 'm' cell
- return the result to the client
## compaction
- just like during reads, all the cells after the first 'm' cell can be discarded safely
- if a row fits the counter-merge criteria (lots of cells old enough to be merged), then either:
a. trigger the merge in-place (will require QUORUM reads) if RF=1, can merge immediately
b. build a list of merge-candidates and merge them later, in async fashion
## merges, general principles #2
- does require a QUORUM read (if we force QUORUM writes) or ALL otherwise, but does NOT require synchronization (neither local, nor between nodes), because
1. no new updates older than the cutoff period will be accepted
2. it IS safe for two nodes to do a merge at the same time, independently from each other, because of the way we sum the cells, even with different starting cutoff cells
## deletes
- a single regular range tombstone is all that's needed
## extra functionality
- nothing is stopping us from keeping (min,max,avg,sum-sq) along with sum in merge-cells, so we can support those functions on counter-cfs as well
- we can allow counter column in regular cfs
- we can allow counter updates in batches (at least in unlogged batches, bc timeuuid validation)
## backwards-compat
- doable, but trickyish
## merge illustration
a row, pre-merge (timeuuid DESC, type ASC ordered), only the relevant part of the timeuuid kept:
key:visits:adc82418:r (+1)
key:visits:7d871160:r (+1)
key:visits:acceb308:r (-1)
key:visits:727b3618:r (+1)
key:visits:2d6dd7d8:r (+1)
key:visits:f30da118:r (+2)
key:visits:b36b0690:r (-3)
key:visits:8065cac8:r (+1) -- the first cell to be older than counter_write_window - its uuid will be used for the merge-cell
key:visits:0658c820:r (+2)
key:visits:796580f8:r (+1)
sum: 6
same row, post-merge:
key:visits:adc82418:r (+1)
key:visits:7d871160:r (+1)
key:visits:acceb308:r (-1)
key:visits:727b3618:r (+1)
key:visits:2d6dd7d8:r (+1)
key:visits:f30da118:r (+2)
key:visits:b36b0690:r (-3)
key:visits:8065cac8:m (+4) -- the resulting merge-cell
key:visits:8065cac8:r (+1) -- the cells below and including this one will be ignored when slicing and discarded during compaction
key:visits:0658c820:r (+2)
key:visits:796580f8:r (+1)
sum: 6
same row, some time in the future, post-compaction, pre-new-merge:
key:visits:9569b458:r (+1)
key:visits:65c8e8e0:r (+1)
key:visits:30d0d210:r (+1)
key:visits:adc82418:r (+1)
key:visits:7d871160:r (+1)
key:visits:acceb308:r (-1)
key:visits:727b3618:r (+1) -- the new cutoff-cell
key:visits:2d6dd7d8:r (+1)
key:visits:f30da118:r (+2)
key:visits:b36b0690:r (-3)
key:visits:8065cac8:m (+4)
sum: 9
same row, post merge:
key:visits:9569b458:r (+1)
key:visits:65c8e8e0:r (+1)
key:visits:30d0d210:r (+1)
key:visits:adc82418:r (+1)
key:visits:7d871160:r (+1)
key:visits:acceb308:r (-1)
key:visits:727b3618:m (+5) -- the new merge-cell
key:visits:727b3618:r (+1) -- the cells below and including this one will be ignored when slicing and discarded during compaction
key:visits:2d6dd7d8:r (+1)
key:visits:f30da118:r (+2)
key:visits:b36b0690:r (-3)
key:visits:8065cac8:m (+4)
sum: 9
same row, after no updates within counter_write_window and after a compaction:
key:visits:9569b458:m (+9)
sum: 9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment