jolynch/BoundedReadCompactionStrategy.md Secret

## BoundedReadCompactionStrategy.md

      
    Raw
  

              BoundedReadCompactionStrategy.md
            
          
    BoundedReadCompactionStrategy

This compaction strategy aims to take the place of all three existing Cassandra Compaction Strategies.
It works by first consolidating sparse data (sorted runs that span the entire token range but are small) until it reaches a configruable target
amount of data (by default 16 * target size = 8 GiB). Then it promotes these sorted runs to be "dense" sorted runs.
Dense runse are considered for compaction via density tiering rather than size. The node's range is broken up into the target
work size and then compactions are performed "across" the levels. This stage uses a different tiering factor because you often want control over how young versus old data compacts with itself.
Phase 1: Flushes

Flush products are very small and very sparse. They overlap with everything and are generally a bummer.
Output: Attempt to split the flush products into split_range size sorted run (will probably not succeed).

Phase 2: Sparse Run Consolidation

Compact across the entire token range.
More or less straight up size tiering, with some automatic min_threshold adjustment based on flush rate.
Output: Split the sparse products into split_range size sorted runs.

Phase 3: Dense Run De-overlapping

Compact across pieces of sorted runs split evenly into work_size splits of the node's data.
Sorted runs are tiered with a factor of min_threshold_levels (4), subject to a constraint that the overlaps
remain under max_read_per_read (10). If we do exceed max_read_per_read the youngest min_threshold_levels levels
compact.
Major compaction across dense and sorted runs occurs on two conditions and any full compactions are spread out over
an target_rewrite_interval_in_seconds of time (4 hours) and involves a target_work_unit_in_mb amount of data (8 GiB).
This allows us to make progress even with only a target_work_unit_in_mb amount of disk space.
Major Compaction Case #1: If old dense runs are found (by default 10 days old). This allows us to ensure that
tombstones find their data within 10 days. Can be disabled by setting max_level_age_in_seconds to zero
Major Compaction Case #2: A user has triggered a full compaction. Sparse runs are compacted immediately and dense runs
will proceed along the interval (target_rewrite_interval_in_seconds).

Note The defaults are fine in almost all cases. You only need to tune these things if you want to disable
some behavior or really want to optimize. Changes in write throughput are gracefully handled e.g. for bulk loading (once we observe 4 rapid flushes we will adjust the min threshold automatically).

  
## options.txt
min_threshold                          (4)          : Consolidate this number of sparse runs of similar size
max_threshold                          (32)         : Do not involve more than this number of sorted runs in consolidation
                                                      Also is the max overlap in the sparse runs.
target_work_unit_in_mb                 (16x target) : How much data to consolidate before promoting to the "levels". Also the
                                                      unit of full compaction
min_sstable_size_in_mb                 (50 MiB)     : The minimum size sstables can split into during consolidation
target_sstable_size_in_mb              (512 MiB)    : Target chunk size for sorted runs
target_consolidate_interval_in_seconds (1 hour)     : How long before consolidating sparse runs regardless of overlap;
target_rewrite_interval_in_seconds     (4 hours)    : When doing full compactions, how long to spread them out

max_read_per_read                      (10)         : How many dense runs to allow in a range slice before compacting
max_sstable_count                      (2000)       : Max number of sstables to aim for on a node. TargetSize may be
                                                      increased automatically to meet this goal.
min_threshold_levels                   (4)          : How many similarly dense sorted runs in a work unit that overlap before
                                                      compacting.
max_level_age_in_seconds               (10 days)    : If a sorted run has lived for longer than this duration, full compact
                                                      across the run (doing ~= target_work work);
level_bucket_low                       (0.75)       : How similar in density sorted runs have to be to compact
level_bucket_high                      (1.25)

## read_heavy_tuning.txt
# We want to reduce the read amplification bound and more aggressively pay write amplification
# to reduce read amplification.
max_read_per_read    = 6
min_threshold_levels = 2

## timeseries_8_hour_ttl.txt
# Disable density tiering in the levels. Rely on either max_read_per_read compactions or
# tombstone compactions to clean up data.
min_threshold_levels = 0
max_read_per_read    = 64

# Note that if we hit the max_read_per_read (which will happen if we have 64 * 16GiB = 1TiB of data)
# we will consolidate min(4, min_threshold_levels) youngest runs. So in this case if we go above 1 TiB
# of data we'll collect 4 young dense runs together which moves to a 4 hour drop window from a 1 hour
# drop window. If you want to drop fully expired sstables, generally speaking max_read_per_read should be
# 2 * (node dataset size / target_work_unit_in_mb)

# Work towards 16 GiB "dense" runs, consolidating every hour
target_work_unit_in_mb                 = 16384
target_consolidate_interval_in_seconds = 1 hour


## timeseries_one_year_ttl.txt
# Still do density tiering, but require a high tiering factor so that we do very few consolidations of
# the dense data, reducing our write amplification
min_threshold_levels = 32
max_read_per_read    = 64
# Optional, if you want to turn off the default 10 day full compactions
# For most long TTL use cases you don't have to do this because 10 days is
# much lower than the TTL.
# max_level_age_in_seconds = 0

# Work towards 16 GiB "dense" runs, consolidating at least once every day
target_work_unit_in_mb                 = 16384
target_consolidate_interval_in_seconds = 86400 (1 day)

## write_heavy_tuning
# Raise our read amplification bounds to reduce write amplification.
max_read_per_read                      = 16
min_threshold_levels                   = 8

# Work towards larger work units since we're writing heavily
target_work_unit_in_mb                 = 16384
target_consolidate_interval_in_seconds = 4 hour
	min_threshold (4) : Consolidate this number of sparse runs of similar size
	max_threshold (32) : Do not involve more than this number of sorted runs in consolidation
	Also is the max overlap in the sparse runs.
	target_work_unit_in_mb (16x target) : How much data to consolidate before promoting to the "levels". Also the
	unit of full compaction
	min_sstable_size_in_mb (50 MiB) : The minimum size sstables can split into during consolidation
	target_sstable_size_in_mb (512 MiB) : Target chunk size for sorted runs
	target_consolidate_interval_in_seconds (1 hour) : How long before consolidating sparse runs regardless of overlap;
	target_rewrite_interval_in_seconds (4 hours) : When doing full compactions, how long to spread them out

	max_read_per_read (10) : How many dense runs to allow in a range slice before compacting
	max_sstable_count (2000) : Max number of sstables to aim for on a node. TargetSize may be
	increased automatically to meet this goal.
	min_threshold_levels (4) : How many similarly dense sorted runs in a work unit that overlap before
	compacting.
	max_level_age_in_seconds (10 days) : If a sorted run has lived for longer than this duration, full compact
	across the run (doing ~= target_work work);
	level_bucket_low (0.75) : How similar in density sorted runs have to be to compact
	level_bucket_high (1.25)
	# We want to reduce the read amplification bound and more aggressively pay write amplification
	# to reduce read amplification.
	max_read_per_read = 6
	min_threshold_levels = 2
	# Disable density tiering in the levels. Rely on either max_read_per_read compactions or
	# tombstone compactions to clean up data.
	min_threshold_levels = 0
	max_read_per_read = 64

	# Note that if we hit the max_read_per_read (which will happen if we have 64 * 16GiB = 1TiB of data)
	# we will consolidate min(4, min_threshold_levels) youngest runs. So in this case if we go above 1 TiB
	# of data we'll collect 4 young dense runs together which moves to a 4 hour drop window from a 1 hour
	# drop window. If you want to drop fully expired sstables, generally speaking max_read_per_read should be
	# 2 * (node dataset size / target_work_unit_in_mb)

	# Work towards 16 GiB "dense" runs, consolidating every hour
	target_work_unit_in_mb = 16384
	target_consolidate_interval_in_seconds = 1 hour
	# Still do density tiering, but require a high tiering factor so that we do very few consolidations of
	# the dense data, reducing our write amplification
	min_threshold_levels = 32
	max_read_per_read = 64
	# Optional, if you want to turn off the default 10 day full compactions
	# For most long TTL use cases you don't have to do this because 10 days is
	# much lower than the TTL.
	# max_level_age_in_seconds = 0

	# Work towards 16 GiB "dense" runs, consolidating at least once every day
	target_work_unit_in_mb = 16384
	target_consolidate_interval_in_seconds = 86400 (1 day)
	# Raise our read amplification bounds to reduce write amplification.
	max_read_per_read = 16
	min_threshold_levels = 8

	# Work towards larger work units since we're writing heavily
	target_work_unit_in_mb = 16384
	target_consolidate_interval_in_seconds = 4 hour