jbowens/heuristics.md

## heuristics.md

      
    Raw
  

              heuristics.md
            
          
    Ideally, we could evaluate heuristics on:

Write amplification

We have existing data in Metrics.


Space amplification

At the end of a test, we could manually Compact(_, _) the entire key space and measure the change in sum of alls stables.
Measuring the size of the database's active files at consistent points in the same workload can give us a relative measure of space amplification (eg, this heuristic amplified by x additional bytes), but not an amplification factor.


Read amplification

Point read amplification

How many files are in L0?


Range read amplification

The presence of tombstones may increase this.
We can measure per-operation latency if we include reads as a part of workload. Otherwise, we need to rely on roach tests.
We could measure block cache retrievals, but that would include reads from queries and compactions.


Trace / replay whole workloads

We could add facilities for recording / tracing storage-level operations to a log for replaying later. We could collect these logs from a store running various representative Cockroach workloads, and replay them against Pebble DBs with various heuristics.
The benefit to this approach is the comprehensiveness of the workload. Capturing realistic read queries would allow us to evaluate read amplification directly at the Pebble layer. It could include relative timing of operations, which would allow us to ensure that we run workloads at a reasonable pacing. It would also ensure that if a compaction heuristic requires tracking data (like dropped keys) even in the memtable, it provides an opportunity to do it. Because it captures a comprehensive view of Pebble-activity, it might be generally useful for debugging or testing.
The cost of this approach's comprehensive collection is that collecting all that data is expensive both in time and space. Writing all operations to a second, adjacent log would slow down the workload being traced, although it's not clear to me just how significantly. Unlike the WAL, this log can be flushed asynchronously. It's also expensive in terms of space, since it includes all operations, including reads and writes that might never make it to L0 (keys dropped while in the memtable).
Here's a rough sketch of what it might look like in code: 284fa521
Replay deleted files

We could use a pebble.Cleaner implementation to capture obsolete files from a store running various representative Cockroach workloads. We could capture either log files or L0 files. We might be able to reconstruct pacing information through also installing a custom EventListener to listen for flushes and garner when files were introduced.
This approach requires less space since it doesn't capture reads and would have a lot less impact on the process running the original workload. It still captures a representative write workload, which will allow us to evaluate write and space amplification. It's also less invasive in Pebble's codebase.
Since no information on representative reads is collected, we would need to rely on roachtests for evaluating read amplification. If we collect L0 files, there's no opportunity for heuristics to capture additional statistics (like dropped keys) in the memtable, which might be useful for making compaction decisions later.
Generate synthetic workloads

We could build off (or build something analagous) the pebble bench command to generate more workloads with new distributions.
This approach avoids any large space requirements since workloads are completely synthetic. Because writes go through the normal write path, it gives heuristics an opportunity to populate L0 stables with any statistics it might need for compaction picking (eg, dropped keys).
The workloads we generate will likely be dissimilar to real-world Cockroach workloads. We'd probably need to focus more on specific pathological cases, and rely on roachtests for evaluating heuristics in the context of Cockroach.