Skip to content

Instantly share code, notes, and snippets.

@keith-turner
Last active December 11, 2019 23:21
Show Gist options
  • Save keith-turner/16125790c6ff0d86c67795a08d2c057f to your computer and use it in GitHub Desktop.
Save keith-turner/16125790c6ff0d86c67795a08d2c057f to your computer and use it in GitHub Desktop.
A Proposed Modification to Accumulo's Compaction Algorithm

A Proposed Modification to Accumulo's Compaction Algorithm

By default, compactions in Accumulo are driven by a configurable compaction ratio using the following algorithm.

  1. If LF * CR < SUM then compact this set of files. LF is the largest file in the set. CR is the compaction ratio. SUM is the size of all files in the set.
  2. Remove largest file from set.
  3. If set is empty, then compact no files.
  4. Go to 1.

While working on compaculation in service of #564 and trying to figure out a sensible way to support parallel compactions, I came up with a modified file selection algorithm that finds the largest set of small files to compact. See this code in Compaculation for the algorithm.

The motivation for this new algorithm were sets of files like with sizes like [101M,102M,103M,104M,4M,3M,3M,3M,3M]. There is a huge disparity in sizes between the files. Accumulo's current algorithm would compact all files in a single compaction. The large files may take a lot of time to compact, while the small files could compact very quickly. This long compaction has two disadvantages, first a lot more small files could arrive while the long compaction is running. Second for scans, we always want less files therefore quickly reducing the number of files is advantageous for scans.

The new algorithm would first compact [4M,3M,3M,3M,3M] and then compact [101M,102M,103M,104M,16M] where the 16M file is the result of the first compaction. This algorithm does a little more work, but it reduces the number of files more quickly. Although it may do more work, its still logarithmic because every set is chosen using the compaction ratio. As a proof of this consider a key that starts off in a 1M file. With a compaction ratio of 3, this key would only be rewritten into a file of at least 3M. After that, the key would only be rewritten into files of at least 9M, 27M, etc. Since the requirements for rewrite are exponential, the amount of rewriting is logarithmic.

The new algorithm works really well in the case when a tablet can have multiple concurrent compactions, with the caveat that when a compaction is running all files larger than those being compacted should not be considered. See file count parity for an explanation.

Testing

To test the new algorithm, Compaculation commit c44017c was used. This simulated 100,000 ticks, 100 tablets, and 5M files added to 4 random tablets each tick. The average number of files per tablet are plotted below for compaction ratio of 3. Green is the new algorithm (with concurrent compactions per tablet) and purple is the old algorithm (w/o concurrent compactions).

CR3

Below is a plot for a compaction ratio of 2.

CR2

Below is a plot for a compaction ratio of 1.5.

CR1.5

The results are much better, assuming the simulation is somewhat correct. Need to determine how much of the improvement is from parallelism vs the new algorithm. The new algorithm did rewrite more data. For a compaction ratio of 3, it rewrote 22% more data. For 2 it rewrote 13% more data. For 1.5 it rewrote 25% more data.

When running the new algorithm it could schedule multiple concurrent compactions per tablet. There were 4 executors, one for compactions < 10M, one for < 100M, one for < 500M, and one for everything else.

Conclusion

This new approach may have the advantage of maintaining a smaller set of files for a system that is constantly adding files. This is done with logarithmic cost and once the system settles (files stop being added) it will compact down to the same number of files per tablet as the old approach.

Ensuring bottom ends up with same file count as top down.

When doing bottom up concurrent compactions, the order matters. If sets of files are not compacted in certain order then we may not end up with the minmum number of files. This will be shown through examples using the following set of files.

File Size
B1 100M
B2 100M
B3 100M
B4 100M
M1 10M
M2 10M
M3 10M
M4 10M
S1 1M
S2 1M
S3 1M
S4 1M

For a ratio of 3, the top down algorithm would select all files in the table above producing a single output file of size 444M. So ideally using the bottom up algorithm would also produce a single file. Assume the bottom up algorithm schedules the following three concurrent compactions.

  • Compact B1,B2,B3,B4 into a new file B5 of size 400M
  • Compact M1,M2,M3,M4 into a new file M5 of size 40M
  • Compact S1,S2,S3,S4 into a new file S5 of size 4M

After these all complete, the resulting files B5,M5,and S5 no longer meet the compation ratio and would not compact to a single file. A simple way to avoid this is to not schedule a concurrent compaction of files larger than anything running. For example assume a compaction of M1,M2,M3,M4 is running, then its ok to schedule a concurrent compaction of S1,S2,S3,S4 but not B1,B2,B3,B4. To see this assume the following two compactions run concurrently.

  • Compact M1,M2,M3,M4 into a new file M5 of size 40M
  • Compact S1,S2,S3,S4 into a new file S5 of size 4M

After these compactions complete the files B1,B2,B3,B4,M5,S5 meet the compaction ration and will compact to a single file. This is the reason for the check on line 37 in the algorithm.

Sum check

For the following set of files, the files F3,F4,F5,F6 meet the compaction ratio (for a ratio of 3). However compacting these files produces a 132M file and the 132M,100, and 99M do not meet the compaction ratio. The top down algorithm would compact these to one file where a bottom up algorithm that only considers compaction ration would produce three.

File Size
F1 100M
F2 99M
F3 33M
F4 33M
F5 33M
F6 33M

To avoid this, the sum of the file sizes for F3,F4,F5,F6 is checked to see if a compaction would produce a file less than 99M. If the size is less than 99M, then this would ensure the compaction ratio of a larger set is maintained. In this case since the sum is not less than 99M and a larger set of files meet the compaction ratio, the new algorithm will compact the larger set. This is the reason for the check sum < nextSize on line 107 of the algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment