keith-turner/compaction-algorithm.md

## compaction-algorithm.md

      
    Raw
  

              compaction-algorithm.md
            
          
    A Proposed Modification to Accumulo's Compaction Algorithm

By default, compactions in Accumulo are driven by a configurable compaction ratio using the following algorithm.

If LF * CR < SUM then compact this set of files.  LF is the largest file in the set.  CR is the compaction ratio.  SUM is the size of all files in the set.
Remove largest file from set.
If set is empty, then compact no files.
Go to 1.

While working on compaculation in service of #564 and trying to figure out a sensible way to support parallel compactions, I came up with a modified file selection algorithm that finds the largest set of small files to compact.    See this code in Compaculation for the algorithm.
The motivation for this new algorithm were sets of files like with sizes like [101M,102M,103M,104M,4M,3M,3M,3M,3M].
There is a huge disparity in sizes between the files. Accumulo's current algorithm would compact all files in a single compaction. The large files may take a lot of time to compact, while the small files could compact very quickly.  This long compaction has two disadvantages, first a lot more small files could arrive while the long compaction is running.  Second for scans, we always want less files therefore quickly reducing the number of files is advantageous for scans.
The new algorithm would first compact [4M,3M,3M,3M,3M] and then compact [101M,102M,103M,104M,16M] where the 16M file is the result of the first compaction.  This algorithm does a little more work, but it reduces the number of files
more quickly. Although it may do more work, its still logarithmic because every set is chosen using the compaction ratio. As a proof of this consider a key that starts off in a 1M file.  With a compaction ratio of 3, this key would only be rewritten into a file of at least 3M.  After that, the key would only be rewritten into files of at least 9M, 27M, etc. Since the requirements for rewrite are exponential, the amount of rewriting is logarithmic.
The new algorithm works really well in the case when a tablet can have multiple concurrent compactions, with the caveat that when a compaction is running all files larger than those being compacted should not be considered.  See file count parity for an explanation.
Testing

To test the new algorithm, Compaculation commit c44017c was used. This simulated 100,000 ticks, 100 tablets, and 5M files added to 4 random tablets each tick.  The average number of files per tablet are plotted below for compaction ratio of 3.  Green is the new algorithm (with concurrent compactions per tablet) and purple is the old algorithm (w/o concurrent compactions).

Below is a plot for a compaction ratio of 2.

Below is a plot for a compaction ratio of 1.5.

The results are much better, assuming the simulation is somewhat correct.  Need to determine how much of the improvement is from parallelism vs the new algorithm. The new algorithm did rewrite more data.  For a compaction ratio of 3, it rewrote 22% more data.  For 2 it rewrote 13% more data.  For 1.5 it rewrote 25% more data.
When running the new algorithm it could schedule multiple concurrent compactions per tablet.  There were 4 executors, one for compactions < 10M, one for < 100M, one for < 500M, and one for everything else.
Conclusion

This new approach may have the advantage of maintaining a smaller set of files for a system that is constantly adding files.  This is done with logarithmic cost and once the system settles (files stop being added) it will compact down to the same number of files per tablet as the old approach.

  
## file-count-parity.md

      
    Raw
  

              file-count-parity.md
            
          
    Ensuring bottom ends up with same file count as top down.

When doing bottom up concurrent compactions, the order matters.  If sets of files are not compacted in certain order then
we may not end up with the minmum number of files.  This will be shown through examples using the following set of files.


File
Size


B1
100M


B2
100M


B3
100M


B4
100M


M1
10M


M2
10M


M3
10M


M4
10M


S1
1M


S2
1M


S3
1M


S4
1M


For a ratio of 3, the top down algorithm would select all files in the table above producing a single output file of size 444M.  So ideally using the bottom up algorithm would also produce a single file.  Assume the bottom up algorithm schedules the following three concurrent compactions.

Compact B1,B2,B3,B4 into a new file B5 of size 400M
Compact M1,M2,M3,M4 into a new file M5 of size 40M
Compact S1,S2,S3,S4 into a new file S5 of size 4M

After these all complete, the resulting files B5,M5,and S5 no longer meet the compation ratio and would not compact to
a single file.  A simple way to avoid this is to not schedule a concurrent compaction of files larger than anything running.  For example assume a compaction of M1,M2,M3,M4 is running, then its ok to schedule a concurrent compaction of S1,S2,S3,S4 but not B1,B2,B3,B4.  To see this assume the following two compactions run concurrently.

Compact M1,M2,M3,M4 into a new file M5 of size 40M
Compact S1,S2,S3,S4 into a new file S5 of size 4M

After these compactions complete the files B1,B2,B3,B4,M5,S5 meet the compaction ration and will compact to a single file.  This is the reason for the check on line 37 in the algorithm.
Sum check

For the following set of files, the files F3,F4,F5,F6 meet the compaction ratio (for a ratio of 3).  However compacting these files produces a 132M file and the 132M,100, and 99M do not meet the compaction ratio. The top down algorithm would compact these to one file where a bottom up algorithm that only considers compaction ration would produce three.


File
Size


F1
100M


F2
99M


F3
33M


F4
33M


F5
33M


F6
33M


To avoid this, the sum of the file sizes for F3,F4,F5,F6 is checked to see if a compaction would produce a file less than 99M.  If the size is less than 99M, then this would ensure the compaction ratio of a larger set is maintained.  In this case since the sum is not less than 99M and a larger set of files meet the compaction ratio, the new algorithm will compact the larger set.  This is the reason for the check sum < nextSize on line 107 of the algorithm.
File	Size
B1	100M
B2	100M
B3	100M
B4	100M
M1	10M
M2	10M
M3	10M
M4	10M
S1	1M
S2	1M
S3	1M
S4	1M