Skip to content

Instantly share code, notes, and snippets.

@mjstevens777
Last active March 20, 2024 22:03
Show Gist options
  • Save mjstevens777/b7ae35c589089fc7ac472b006d6734c6 to your computer and use it in GitHub Desktop.
Save mjstevens777/b7ae35c589089fc7ac472b006d6734c6 to your computer and use it in GitHub Desktop.
BuildKit cache expiration behavior

Setup

Let's say we have a Dockerfile with 3 basic stages:

  • 3 base layers of an Ubuntu image. These basically never change.
  • 5 intermediate layers of installing apt/python dependencies. These only change when we want to update our packages.
  • 2 final layers of copying and building our source code. These change every build.

When we change our packages we create a new group of intermediate layers. Let's say that this is our scenario for when we change our packages:

group number number of builds description
0 15 no longer in use
1 11 no longer in use
2 9 the current production image
3 2 try installing a new set of packages, doesn't work, abandoned
4 2 try installing a new set of packages, doesn't work, abandoned
5 1 try installing a new set of packages, current working branch

For desired behavior:

  • Keep all base layers
  • Keep intermediate layers in group 2 (production)
  • Keep intermediate layers in group 5 (active development)
  • Remove group 0 first
  • Groups 1, 3, and 4 are a toss-up
  • Final layers can be removed since they are only cached once

Outcome

All at once

We can simulate the behavior of the cache using the code below and see what order images get deleted in.

If we remove the layers in a single operation, the order is: image

Looking at our desired behavior:

  • Keep all base layers
  • Keep intermediate layers in group 2 (production)
  • Keep intermediate layers in group 5 (active development)
  • Remove group 0 first

We are prone to removing layers under active development.

One at a time

If we remove the layers one at a time, the order becomes more chaotic: image

Looking at our desired behavior:

  • Keep all base layers
  • Keep intermediate layers in group 2 (production)
  • Keep intermediate layers in group 5 (active development)
  • Remove group 0 first

The fact that the behavior changes based on how often we run the function is quite confusing.

Support Code

TODO: This currently does not model the fact that deleting a parent layer also deletes its child layers.

import pandas as pd
import dataclasses
from copy import copy
from tabulate import tabulate
import matplotlib.pyplot as plt
from matplotlib.colors import hsv_to_rgb


@dataclasses.dataclass
class Layer:
    """Simulate a DeleteRecord.
    
    Modeled after
    https://github.com/moby/buildkit/blob/da0131c1c753b88cd29bd0c6ab93e1d2de46785e/cache/manager.go#L1612
    """
    label: str
    last_used: float
    usage_count: int
    last_used_index: int = 0
    max_last_used_index: int = 0
    usage_count_index: int = 0
    max_usage_count_index: int = 0
        
    @property
    def last_used_score(self):
        return self.last_used_index / self.max_last_used_index
    
    @property
    def usage_count_score(self):
        return self.usage_count_index / self.max_usage_count_index
    
    @property
    def combined_score(self):
        return self.last_used_index / self.max_last_used_index + self.usage_count_index / self.max_usage_count_index
    

def sort_for_removal(layers):
    """Simulate sortDeleteRecords.
    
    Modeled after
    https://github.com/moby/buildkit/blob/da0131c1c753b88cd29bd0c6ab93e1d2de46785e/cache/manager.go#L1621
    """
    _sort_for_removal_key(layers, 'usage_count')
    _sort_for_removal_key(layers, 'last_used')
        
    layers.sort(key=lambda l: l.combined_score)
    

def _sort_for_removal_key(layers, key):
    # Sort
    layers.sort(key=lambda l: getattr(l, key))
    prev = None
    idx = -1
    # Group equal values into ranks
    for layer in layers:
        current = getattr(layer, key)
        if current != prev:
            idx += 1
            prev = current
        idx_key = key + '_index'
        setattr(layer, idx_key, idx)
    # Make relative score by dividing by max rank
    max_key = 'max_' + key + '_index'
    max_idx = idx + 1
    for layer in layers:
        setattr(layer, max_key, max_idx)
    
    

def simulate_layers(build_groups, num_base_layers=3, num_intermediate_layers=5, num_final_layers=2):
    """Construct some simulated layers based on a realistic Dockerfile scenario.
    
    base layers never change (e.g. OS-level)
    intermediate layers sometimes change (e.g. installed dependencies/packages)
    final layers change on every build (e.g. application code)
    
    build_groups represents the number of times we create a build with a given
    set of intermediate layers
    i.e. how many builds we do in between changing the list of packages to install
    
    """
    max_layers = num_base_layers + num_intermediate_layers + num_final_layers
    total_builds = sum(build_groups)
    
    layers = []
    # base layers
    for layer_num in range(num_base_layers):
        # expected behavior
        usage_count = total_builds
        # actual behavior
#         if layer_num == num_base_layers - 1:
#             usage_count = len(build_groups)
#         else:
#             usage_count = 1
        layers.append(Layer(
            label=f'base_layer{layer_num}',
            last_used=total_builds - 1 + layer_num / max_layers,
            usage_count=usage_count))


    # intermediate layers
    group_start = 0
    for group_idx, group_size in enumerate(build_groups):
        group_end = group_start + group_size
        for i in range(num_intermediate_layers):
            # expected behavior
            usage_count = group_size
            # actual behavior
#             if i == num_intermediate_layers - 1:
#                 usage_count = group_size
#             else:
#                 usage_count = 1
            layer_num = i + num_base_layers
            layers.append(Layer(
                label=f'intermed_group{group_idx}_layer{i}',
                last_used=group_end - 1 + layer_num / max_layers,
                usage_count=usage_count))
        # final layers
        for build_num in range(group_start, group_end):
            for i in range(num_final_layers):
                layer_num = i + num_base_layers + num_intermediate_layers
                layers.append(Layer(
                    label=f'final_group{group_idx}_build{build_num}_layer{i}',
                    last_used=build_num + layer_num / max_layers,
                    usage_count=1))
            
        group_start = group_end

    return layers


def pretty_print(layers):
    """Print out layers as a table."""
    headers = ['label', 'last_used', 'usage_count']
    rows = []
    for layer in layers:
        if layer.max_last_used_index == 0:
            last_used = str(layer.last_used)
            usage_count = str(layer.usage_count)
        else:
            last_used = f'{layer.last_used} ({layer.last_used_index}/{layer.max_last_used_index})'
            usage_count = f'{layer.usage_count} ({layer.usage_count_index}/{layer.max_usage_count_index})'
        rows.append([layer.label, last_used, usage_count])
    print(tabulate(rows, headers=headers, tablefmt="pipe"))
    

def plot_groups(layers):
    groups = {}
    for i, layer in enumerate(layers):
        if 'base' in layer.label:
            group = 'base'
        else:
            key, num = layer.label.split('_')[:2]
            num = num[-1]
            group = f'{key}\n{num}'

        if not group in groups:
            groups[group] = {'name': group, 'indices': []}
        groups[group]['indices'].append(i)
        
    groups = list(groups.values())
    for group in groups:
        group['start'] = min(group['indices'])
        group['end'] = max(group['indices']) + 1
        
    for i, group in enumerate(groups):
        y = 0
        while True:
            for prev_group in groups[:i]:
                if prev_group['y'] != y:
                    continue
                if prev_group['end'] <= group['start']:
                    continue
                y += 1
                break
            else:
                break
        group['y'] = y

    _, ax = plt.subplots()
    xs = []
    ys = []
    for group in groups:
        for idx in group['indices']:
            xs.append(idx)
            ys.append(group['y'] + 0.1)
    plt.scatter(xs, ys, color='gray')
    for i, group in enumerate(groups):
        h = (i * 0.381966) % 1
        color = hsv_to_rgb((h, 0.2, 1))
        start = min(group['indices'])
        end = max(group['indices']) + 1
        p = ax.bar(
            x=(start + end) / 2, width=end - start,
            height=0.8, bottom=group['y'] + 0.2,
            color=color)
        ax.bar_label(p, labels=[group['name']], label_type='center')
    plt.gcf().set_size_inches(14, 4)
    plt.yticks([])
    plt.show()
    

# Main code
layers = simulate_layers([20, 10, 10, 2, 2, 1])
layers.sort(key=lambda l: l.last_used)
pretty_print(layers)

sort_for_removal(layers)
plot_groups(layers)
pretty_print(layers)

layers_one_at_a_time = remove_one_at_a_time(layers)
plot_groups(layers)
pretty_print(layers_one_at_a_time)

Raw Data

All layers, by last used date

label last_used usage_count
final_group0_build0_layer0 0.8 1
final_group0_build0_layer1 0.9 1
final_group0_build1_layer0 1.8 1
final_group0_build1_layer1 1.9 1
final_group0_build2_layer0 2.8 1
final_group0_build2_layer1 2.9 1
final_group0_build3_layer0 3.8 1
final_group0_build3_layer1 3.9 1
final_group0_build4_layer0 4.8 1
final_group0_build4_layer1 4.9 1
final_group0_build5_layer0 5.8 1
final_group0_build5_layer1 5.9 1
final_group0_build6_layer0 6.8 1
final_group0_build6_layer1 6.9 1
final_group0_build7_layer0 7.8 1
final_group0_build7_layer1 7.9 1
final_group0_build8_layer0 8.8 1
final_group0_build8_layer1 8.9 1
final_group0_build9_layer0 9.8 1
final_group0_build9_layer1 9.9 1
final_group0_build10_layer0 10.8 1
final_group0_build10_layer1 10.9 1
final_group0_build11_layer0 11.8 1
final_group0_build11_layer1 11.9 1
final_group0_build12_layer0 12.8 1
final_group0_build12_layer1 12.9 1
final_group0_build13_layer0 13.8 1
final_group0_build13_layer1 13.9 1
intermed_group0_layer0 14.3 15
intermed_group0_layer1 14.4 15
intermed_group0_layer2 14.5 15
intermed_group0_layer3 14.6 15
intermed_group0_layer4 14.7 15
final_group0_build14_layer0 14.8 1
final_group0_build14_layer1 14.9 1
final_group1_build15_layer0 15.8 1
final_group1_build15_layer1 15.9 1
final_group1_build16_layer0 16.8 1
final_group1_build16_layer1 16.9 1
final_group1_build17_layer0 17.8 1
final_group1_build17_layer1 17.9 1
final_group1_build18_layer0 18.8 1
final_group1_build18_layer1 18.9 1
final_group1_build19_layer0 19.8 1
final_group1_build19_layer1 19.9 1
final_group1_build20_layer0 20.8 1
final_group1_build20_layer1 20.9 1
final_group1_build21_layer0 21.8 1
final_group1_build21_layer1 21.9 1
final_group1_build22_layer0 22.8 1
final_group1_build22_layer1 22.9 1
final_group1_build23_layer0 23.8 1
final_group1_build23_layer1 23.9 1
final_group1_build24_layer0 24.8 1
final_group1_build24_layer1 24.9 1
intermed_group1_layer0 25.3 11
intermed_group1_layer1 25.4 11
intermed_group1_layer2 25.5 11
intermed_group1_layer3 25.6 11
intermed_group1_layer4 25.7 11
final_group1_build25_layer0 25.8 1
final_group1_build25_layer1 25.9 1
final_group2_build26_layer0 26.8 1
final_group2_build26_layer1 26.9 1
final_group2_build27_layer0 27.8 1
final_group2_build27_layer1 27.9 1
final_group2_build28_layer0 28.8 1
final_group2_build28_layer1 28.9 1
final_group2_build29_layer0 29.8 1
final_group2_build29_layer1 29.9 1
final_group2_build30_layer0 30.8 1
final_group2_build30_layer1 30.9 1
final_group2_build31_layer0 31.8 1
final_group2_build31_layer1 31.9 1
final_group2_build32_layer0 32.8 1
final_group2_build32_layer1 32.9 1
final_group2_build33_layer0 33.8 1
final_group2_build33_layer1 33.9 1
intermed_group2_layer0 34.3 9
intermed_group2_layer1 34.4 9
intermed_group2_layer2 34.5 9
intermed_group2_layer3 34.6 9
intermed_group2_layer4 34.7 9
final_group2_build34_layer0 34.8 1
final_group2_build34_layer1 34.9 1
final_group3_build35_layer0 35.8 1
final_group3_build35_layer1 35.9 1
intermed_group3_layer0 36.3 2
intermed_group3_layer1 36.4 2
intermed_group3_layer2 36.5 2
intermed_group3_layer3 36.6 2
intermed_group3_layer4 36.7 2
final_group3_build36_layer0 36.8 1
final_group3_build36_layer1 36.9 1
final_group4_build37_layer0 37.8 1
final_group4_build37_layer1 37.9 1
intermed_group4_layer0 38.3 2
intermed_group4_layer1 38.4 2
intermed_group4_layer2 38.5 2
intermed_group4_layer3 38.6 2
intermed_group4_layer4 38.7 2
final_group4_build38_layer0 38.8 1
final_group4_build38_layer1 38.9 1
base_layer0 39 40
base_layer1 39.1 40
base_layer2 39.2 40
intermed_group5_layer0 39.3 1
intermed_group5_layer1 39.4 1
intermed_group5_layer2 39.5 1
intermed_group5_layer3 39.6 1
intermed_group5_layer4 39.7 1
final_group5_build39_layer0 39.8 1
final_group5_build39_layer1 39.9 1

Removed in one operation

label last_used usage_count
final_group0_build0_layer0 0.8 (0/113) 1 (0/5)
final_group0_build0_layer1 0.9 (1/113) 1 (0/5)
final_group0_build1_layer0 1.8 (2/113) 1 (0/5)
final_group0_build1_layer1 1.9 (3/113) 1 (0/5)
final_group0_build2_layer0 2.8 (4/113) 1 (0/5)
final_group0_build2_layer1 2.9 (5/113) 1 (0/5)
final_group0_build3_layer0 3.8 (6/113) 1 (0/5)
final_group0_build3_layer1 3.9 (7/113) 1 (0/5)
final_group0_build4_layer0 4.8 (8/113) 1 (0/5)
final_group0_build4_layer1 4.9 (9/113) 1 (0/5)
final_group0_build5_layer0 5.8 (10/113) 1 (0/5)
final_group0_build5_layer1 5.9 (11/113) 1 (0/5)
final_group0_build6_layer0 6.8 (12/113) 1 (0/5)
final_group0_build6_layer1 6.9 (13/113) 1 (0/5)
final_group0_build7_layer0 7.8 (14/113) 1 (0/5)
final_group0_build7_layer1 7.9 (15/113) 1 (0/5)
final_group0_build8_layer0 8.8 (16/113) 1 (0/5)
final_group0_build8_layer1 8.9 (17/113) 1 (0/5)
final_group0_build9_layer0 9.8 (18/113) 1 (0/5)
final_group0_build9_layer1 9.9 (19/113) 1 (0/5)
final_group0_build10_layer0 10.8 (20/113) 1 (0/5)
final_group0_build10_layer1 10.9 (21/113) 1 (0/5)
final_group0_build11_layer0 11.8 (22/113) 1 (0/5)
final_group0_build11_layer1 11.9 (23/113) 1 (0/5)
final_group0_build12_layer0 12.8 (24/113) 1 (0/5)
final_group0_build12_layer1 12.9 (25/113) 1 (0/5)
final_group0_build13_layer0 13.8 (26/113) 1 (0/5)
final_group0_build13_layer1 13.9 (27/113) 1 (0/5)
final_group0_build14_layer0 14.8 (33/113) 1 (0/5)
final_group0_build14_layer1 14.9 (34/113) 1 (0/5)
final_group1_build15_layer0 15.8 (35/113) 1 (0/5)
final_group1_build15_layer1 15.9 (36/113) 1 (0/5)
final_group1_build16_layer0 16.8 (37/113) 1 (0/5)
final_group1_build16_layer1 16.9 (38/113) 1 (0/5)
final_group1_build17_layer0 17.8 (39/113) 1 (0/5)
final_group1_build17_layer1 17.9 (40/113) 1 (0/5)
final_group1_build18_layer0 18.8 (41/113) 1 (0/5)
final_group1_build18_layer1 18.9 (42/113) 1 (0/5)
final_group1_build19_layer0 19.8 (43/113) 1 (0/5)
final_group1_build19_layer1 19.9 (44/113) 1 (0/5)
final_group1_build20_layer0 20.8 (45/113) 1 (0/5)
final_group1_build20_layer1 20.9 (46/113) 1 (0/5)
final_group1_build21_layer0 21.8 (47/113) 1 (0/5)
final_group1_build21_layer1 21.9 (48/113) 1 (0/5)
final_group1_build22_layer0 22.8 (49/113) 1 (0/5)
final_group1_build22_layer1 22.9 (50/113) 1 (0/5)
final_group1_build23_layer0 23.8 (51/113) 1 (0/5)
final_group1_build23_layer1 23.9 (52/113) 1 (0/5)
final_group1_build24_layer0 24.8 (58/113) 1 (0/5)
final_group1_build24_layer1 24.9 (59/113) 1 (0/5)
final_group2_build25_layer0 25.8 (60/113) 1 (0/5)
final_group2_build25_layer1 25.9 (61/113) 1 (0/5)
final_group2_build26_layer0 26.8 (62/113) 1 (0/5)
final_group2_build26_layer1 26.9 (63/113) 1 (0/5)
final_group2_build27_layer0 27.8 (64/113) 1 (0/5)
final_group2_build27_layer1 27.9 (65/113) 1 (0/5)
final_group2_build28_layer0 28.8 (66/113) 1 (0/5)
final_group2_build28_layer1 28.9 (67/113) 1 (0/5)
final_group2_build29_layer0 29.8 (68/113) 1 (0/5)
final_group2_build29_layer1 29.9 (69/113) 1 (0/5)
final_group2_build30_layer0 30.8 (70/113) 1 (0/5)
final_group2_build30_layer1 30.9 (71/113) 1 (0/5)
final_group2_build31_layer0 31.8 (72/113) 1 (0/5)
final_group2_build31_layer1 31.9 (73/113) 1 (0/5)
final_group2_build32_layer0 32.8 (74/113) 1 (0/5)
final_group2_build32_layer1 32.9 (75/113) 1 (0/5)
final_group2_build33_layer0 33.8 (76/113) 1 (0/5)
final_group2_build33_layer1 33.9 (77/113) 1 (0/5)
final_group2_build34_layer0 34.8 (83/113) 1 (0/5)
final_group2_build34_layer1 34.9 (84/113) 1 (0/5)
final_group3_build35_layer0 35.8 (85/113) 1 (0/5)
final_group3_build35_layer1 35.9 (86/113) 1 (0/5)
final_group3_build36_layer0 36.8 (92/113) 1 (0/5)
final_group3_build36_layer1 36.9 (93/113) 1 (0/5)
final_group4_build37_layer0 37.8 (94/113) 1 (0/5)
final_group4_build37_layer1 37.9 (95/113) 1 (0/5)
intermed_group0_layer0 14.3 (28/113) 15 (3/5)
intermed_group0_layer1 14.4 (29/113) 15 (3/5)
intermed_group0_layer2 14.5 (30/113) 15 (3/5)
intermed_group1_layer0 24.3 (53/113) 10 (2/5)
intermed_group0_layer3 14.6 (31/113) 15 (3/5)
intermed_group1_layer1 24.4 (54/113) 10 (2/5)
intermed_group0_layer4 14.7 (32/113) 15 (3/5)
intermed_group1_layer2 24.5 (55/113) 10 (2/5)
final_group4_build38_layer0 38.8 (101/113) 1 (0/5)
intermed_group1_layer3 24.6 (56/113) 10 (2/5)
final_group4_build38_layer1 38.9 (102/113) 1 (0/5)
intermed_group1_layer4 24.7 (57/113) 10 (2/5)
intermed_group5_layer0 39.3 (106/113) 1 (0/5)
intermed_group5_layer1 39.4 (107/113) 1 (0/5)
intermed_group5_layer2 39.5 (108/113) 1 (0/5)
intermed_group5_layer3 39.6 (109/113) 1 (0/5)
intermed_group3_layer0 36.3 (87/113) 2 (1/5)
intermed_group5_layer4 39.7 (110/113) 1 (0/5)
intermed_group3_layer1 36.4 (88/113) 2 (1/5)
final_group5_build39_layer0 39.8 (111/113) 1 (0/5)
intermed_group3_layer2 36.5 (89/113) 2 (1/5)
final_group5_build39_layer1 39.9 (112/113) 1 (0/5)
intermed_group3_layer3 36.6 (90/113) 2 (1/5)
intermed_group3_layer4 36.7 (91/113) 2 (1/5)
intermed_group4_layer0 38.3 (96/113) 2 (1/5)
intermed_group4_layer1 38.4 (97/113) 2 (1/5)
intermed_group4_layer2 38.5 (98/113) 2 (1/5)
intermed_group4_layer3 38.6 (99/113) 2 (1/5)
intermed_group4_layer4 38.7 (100/113) 2 (1/5)
intermed_group2_layer0 34.3 (78/113) 10 (2/5)
intermed_group2_layer1 34.4 (79/113) 10 (2/5)
intermed_group2_layer2 34.5 (80/113) 10 (2/5)
intermed_group2_layer3 34.6 (81/113) 10 (2/5)
intermed_group2_layer4 34.7 (82/113) 10 (2/5)
base_layer0 39.0 (103/113) 40 (4/5)
base_layer1 39.1 (104/113) 40 (4/5)
base_layer2 39.2 (105/113) 40 (4/5)

Removed one layer at a time

label last_used usage_count
final_group0_build0_layer0 0.8 (0/123) 20 (3/5)
final_group0_build0_layer1 0.9 (0/122) 20 (3/5)
final_group0_build1_layer0 1.8 (0/121) 20 (3/5)
final_group0_build1_layer1 1.9 (0/120) 20 (3/5)
final_group0_build2_layer0 2.8 (0/119) 20 (3/5)
final_group0_build2_layer1 2.9 (0/118) 20 (3/5)
final_group0_build3_layer0 3.8 (0/117) 20 (3/5)
final_group0_build3_layer1 3.9 (0/116) 20 (3/5)
final_group0_build4_layer0 4.8 (0/115) 20 (3/5)
final_group0_build4_layer1 4.9 (0/114) 20 (3/5)
final_group0_build5_layer0 5.8 (0/113) 20 (3/5)
final_group0_build5_layer1 5.9 (0/112) 20 (3/5)
final_group0_build6_layer0 6.8 (0/111) 20 (3/5)
final_group0_build6_layer1 6.9 (0/110) 20 (3/5)
final_group0_build7_layer0 7.8 (0/109) 20 (3/5)
final_group0_build7_layer1 7.9 (0/108) 20 (3/5)
final_group0_build8_layer0 8.8 (0/107) 20 (3/5)
final_group0_build8_layer1 8.9 (0/106) 20 (3/5)
final_group0_build9_layer0 9.8 (0/105) 20 (3/5)
final_group0_build9_layer1 9.9 (0/104) 20 (3/5)
final_group0_build10_layer0 10.8 (0/103) 20 (3/5)
final_group0_build10_layer1 10.9 (0/102) 20 (3/5)
final_group0_build11_layer0 11.8 (0/101) 20 (3/5)
final_group0_build11_layer1 11.9 (0/100) 20 (3/5)
final_group0_build12_layer0 12.8 (0/99) 20 (3/5)
final_group0_build12_layer1 12.9 (0/98) 20 (3/5)
final_group1_build20_layer0 20.8 (19/97) 10 (2/5)
final_group1_build20_layer1 20.9 (19/96) 10 (2/5)
final_group0_build13_layer0 13.8 (0/95) 20 (3/5)
final_group1_build21_layer0 21.8 (18/94) 10 (2/5)
final_group1_build21_layer1 21.9 (18/93) 10 (2/5)
final_group1_build22_layer0 22.8 (18/92) 10 (2/5)
final_group1_build22_layer1 22.9 (18/91) 10 (2/5)
final_group0_build13_layer1 13.9 (0/90) 20 (3/5)
final_group1_build23_layer0 23.8 (17/89) 10 (2/5)
final_group1_build23_layer1 23.9 (17/88) 10 (2/5)
final_group1_build24_layer0 24.8 (17/87) 10 (2/5)
final_group1_build24_layer1 24.9 (17/86) 10 (2/5)
final_group0_build14_layer0 14.8 (0/85) 20 (3/5)
final_group1_build25_layer0 25.8 (16/84) 10 (2/5)
final_group1_build25_layer1 25.9 (16/83) 10 (2/5)
final_group1_build26_layer0 26.8 (16/82) 10 (2/5)
final_group1_build26_layer1 26.9 (16/81) 10 (2/5)
final_group0_build14_layer1 14.9 (0/80) 20 (3/5)
final_group1_build27_layer0 27.8 (15/79) 10 (2/5)
final_group1_build27_layer1 27.9 (15/78) 10 (2/5)
final_group1_build28_layer0 28.8 (15/77) 10 (2/5)
final_group1_build28_layer1 28.9 (15/76) 10 (2/5)
final_group0_build15_layer0 15.8 (0/75) 20 (3/5)
intermed_group1_layer0 29.3 (14/74) 10 (2/5)
intermed_group1_layer1 29.4 (14/73) 10 (2/5)
intermed_group1_layer2 29.5 (14/72) 10 (2/5)
intermed_group1_layer3 29.6 (14/71) 10 (2/5)
final_group0_build15_layer1 15.9 (0/70) 20 (3/5)
intermed_group1_layer4 29.7 (13/69) 10 (2/5)
final_group1_build29_layer0 29.8 (13/68) 10 (2/5)
final_group1_build29_layer1 29.9 (13/67) 10 (2/5)
final_group2_build30_layer0 30.8 (13/66) 10 (2/5)
final_group0_build16_layer0 16.8 (0/65) 20 (3/5)
final_group2_build30_layer1 30.9 (12/64) 10 (2/5)
final_group2_build31_layer0 31.8 (12/63) 10 (2/5)
final_group2_build31_layer1 31.9 (12/62) 10 (2/5)
final_group2_build32_layer0 32.8 (12/61) 10 (2/5)
final_group0_build16_layer1 16.9 (0/60) 20 (3/5)
final_group2_build32_layer1 32.9 (11/59) 10 (2/5)
final_group2_build33_layer0 33.8 (11/58) 10 (2/5)
final_group2_build33_layer1 33.9 (11/57) 10 (2/5)
final_group2_build34_layer0 34.8 (11/56) 10 (2/5)
final_group0_build17_layer0 17.8 (0/55) 20 (3/5)
final_group2_build34_layer1 34.9 (10/54) 10 (2/5)
final_group2_build35_layer0 35.8 (10/53) 10 (2/5)
final_group2_build35_layer1 35.9 (10/52) 10 (2/5)
final_group2_build36_layer0 36.8 (10/51) 10 (2/5)
final_group0_build17_layer1 17.9 (0/50) 20 (3/5)
final_group2_build36_layer1 36.9 (9/49) 10 (2/5)
final_group2_build37_layer0 37.8 (9/48) 10 (2/5)
final_group2_build37_layer1 37.9 (9/47) 10 (2/5)
final_group3_build40_layer0 40.8 (18/46) 2 (1/5)
final_group0_build18_layer0 18.8 (0/45) 20 (3/5)
final_group2_build38_layer0 38.8 (8/44) 10 (2/5)
final_group3_build40_layer1 40.9 (16/43) 2 (1/5)
intermed_group3_layer0 41.3 (16/42) 2 (1/5)
intermed_group3_layer1 41.4 (16/41) 2 (1/5)
final_group0_build18_layer1 18.9 (0/40) 20 (3/5)
final_group2_build38_layer1 38.9 (7/39) 10 (2/5)
intermed_group3_layer2 41.5 (14/38) 2 (1/5)
intermed_group3_layer3 41.6 (14/37) 2 (1/5)
intermed_group3_layer4 41.7 (14/36) 2 (1/5)
intermed_group0_layer0 19.3 (0/35) 20 (3/5)
intermed_group2_layer0 39.3 (6/34) 10 (2/5)
final_group3_build41_layer0 41.8 (12/33) 2 (1/5)
final_group3_build41_layer1 41.9 (12/32) 2 (1/5)
final_group4_build42_layer0 42.8 (12/31) 2 (1/5)
intermed_group0_layer1 19.4 (0/30) 20 (3/5)
intermed_group2_layer1 39.4 (5/29) 10 (2/5)
final_group4_build42_layer1 42.9 (10/28) 2 (1/5)
intermed_group4_layer0 43.3 (10/27) 2 (1/5)
intermed_group4_layer1 43.4 (10/26) 2 (1/5)
intermed_group0_layer2 19.5 (0/25) 20 (3/5)
intermed_group2_layer2 39.5 (4/24) 10 (2/5)
intermed_group4_layer2 43.5 (8/23) 2 (1/5)
intermed_group4_layer3 43.6 (8/22) 2 (1/5)
intermed_group4_layer4 43.7 (8/21) 2 (1/5)
intermed_group0_layer3 19.6 (0/20) 20 (3/5)
intermed_group2_layer3 39.6 (3/19) 10 (2/5)
final_group4_build43_layer0 43.8 (6/18) 2 (1/5)
final_group4_build43_layer1 43.9 (6/17) 2 (1/5)
intermed_group2_layer4 39.7 (3/16) 10 (1/4)
final_group2_build39_layer0 39.8 (3/15) 10 (1/4)
final_group2_build39_layer1 39.9 (3/14) 10 (1/4)
intermed_group0_layer4 19.7 (0/13) 20 (1/3)
final_group0_build19_layer0 19.8 (0/12) 20 (1/3)
final_group0_build19_layer1 19.9 (0/11) 20 (1/3)
intermed_group5_layer0 44.3 (3/10) 1 (0/2)
intermed_group5_layer1 44.4 (3/9) 1 (0/2)
intermed_group5_layer2 44.5 (3/8) 1 (0/2)
intermed_group5_layer3 44.6 (3/7) 1 (0/2)
base_layer0 44.0 (0/6) 45 (1/2)
intermed_group5_layer4 44.7 (2/5) 1 (0/2)
base_layer1 44.1 (0/4) 45 (1/2)
final_group5_build44_layer0 44.8 (1/3) 1 (0/2)
base_layer2 44.2 (0/2) 45 (1/2)
final_group5_build44_layer1 44.9 (0/1) 1 (0/1)

Issue

In our setup, docker buildx prune --keep-storage <bytes> is called regularly to keep the cache size below the desired target. The problem is that newly built layers are being deleted from the build cache, while old layers that I no longer need are staying in the cache. This causes unnecessary churn in our docker image builds.

Code Analysis

See #551

The reason new layers are deleted is because the function sortDeleteRecords uses a combination of lastUsedAt and usageCount to decide which images to remove. The usageCount metric favors keeping old layers, and the lastUsedAt metric favors keeping new layers, so the two metrics fight with each other.

In addition, the function normalizes the unique values from 0 to 1. There are only a few values of usageCount while there are many values of lastUsedAt. This means that usageCount has a stronger effect on which layers get deleted. This is why brand new layers with usageCount are getting deleted.

If brand new layers get evicted from the cache before they can build up a higher usageCount, then this problem becomes self-perpetuating with the build continually resetting to an earlier layer.

Reproducing

Unfortunately, I cannot reproduce this issue because I am using a hosted provider, depot.dev. They are backed by buildkit, so I don't have full visibility into what is going on, but I am pretty sure this is what is happening under the hood.

When I attempted to reproduce locally, lastUsedAt and usageCount were not being updated for cached layers. lastUsedAt stayed at the layer's original creation time, and usageCount stayed at 1. The values were only updated for a parent layer when a child layer was rebuilt. This appears to be a bug as well, because layers that are cached when building images are the most valuable ones to keep in the cache.

Proposed Solution

This key line chooses to delete images based on lastUsedAtIndex / maxLastUsedIndex + usageCountIndex / maxUsageCountIndex.

I believe that switching to just lastUsedAt or lastUsedAtIndex for the cache eviction order would solve the problem. Images that are used often are also likely to be used recently, so relying on usageCount is a little redundant. sort.Slice(toDelete, func(i, j int) bool { return float64(toDelete[i].lastUsedAtIndex)/float64(maxLastUsedIndex)+ float64(toDelete[i].usageCountIndex)/float64(maxUsageCountIndex) < float64(toDelete[j].lastUsedAtIndex)/float64(maxLastUsedIndex)+ float64(toDelete[j].usageCountIndex)/float64(maxUsageCountIndex) }) Another alternative is to use lastUsedAtIndex + usageCountIndex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment