Skip to content

Instantly share code, notes, and snippets.

@philsnow
Created June 1, 2024 06:41
Show Gist options
  • Save philsnow/d2927e355702bc55b0083ae6bce467ac to your computer and use it in GitHub Desktop.
Save philsnow/d2927e355702bc55b0083ae6bce467ac to your computer and use it in GitHub Desktop.
MacOS plist oddity

Disclaimer/preface: I don't know all that much about MacOS.

Why are my MacOS backups so large?

I was investigating why my Time Machine (and later, borgbackup'ed tmutil localsnapshots) incremental backups were larger than I expected -- around 1-7 GB worth of changes per day.

I found this out after I had set up Time Machine backups to a TrueNAS CORE machine and had set up regular snapshots of the Time Machine ZFS dataset:

# zfs list -t snapshot -o space -r /mnt/vault/delorean/philsnow | head
NAME                                           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
vault/delorean/philsnow@auto-2024-05-04_04-00      -  6.44G         -       -              -          -
vault/delorean/philsnow@auto-2024-05-05_04-00      -  1.55G         -       -              -          -
vault/delorean/philsnow@auto-2024-05-06_04-00      -  2.09G         -       -              -          -
vault/delorean/philsnow@auto-2024-05-07_04-00      -  3.48G         -       -              -          -
vault/delorean/philsnow@auto-2024-05-08_04-00      -  3.05G         -       -              -          -
vault/delorean/philsnow@auto-2024-05-09_04-00      -  2.80G         -       -              -          -
vault/delorean/philsnow@auto-2024-05-10_04-00      -  2.80G         -       -              -          -
vault/delorean/philsnow@auto-2024-05-11_04-00      -  3.12G         -       -              -          -
vault/delorean/philsnow@auto-2024-05-12_04-00      -  1.86G         -       -              -          -

This is showing the space usage of the daily 4am snapshots that were happening on the TrueNAS machine. During at least one of those days, the only machine that was backing up to this Time Machine share was pretty much quiescent, just sitting there locked... so why were there so many bytes worth of changes?

Can't even mount snapshots tho

It took me several hours just to figure out how to mount local snapshots (did I mention I don't really know MacOS?). There are a lot of articles about how to do this but they were mostly written/published between around 2017 and 2021, so after APFS was introduced, but before the whole Secure Signed Volume (SSV) Security system and (I think) before System Integrity Protection (SIP) showed up.

tl;dr:

  1. Whatever terminal app you're using needs to have Full Disk Access enabled.
  2. The blog posts from before SSV said mount_apfs -s <snapshot_name> / /tmp/snapshot, but that gives either "resource busy" or "mount: /tmp/snapshot failed with 77" or "[...] failed with 67". After much guesswork I found that the / argument has to be /System/Volues/Data (I think because <snapshot_name> is a snapshot for the APFS volume that gets mounted at /System/Volumes/Data, whereas / is another APFS volume and <snapshot_name> is not a snapshot for that volume).

So, as of 2024 and Sonoma, for the internet fossil record, the incantation seems to be:

# mountpoint=/tmp/snapshot
# mkdir -p $mountpoint
# tmutil localsnapshot
# snap=$(tmutil listlocalsnapshots / | grep TimeMachine | sort | tail -1)
# mount_apfs -s $snap -o ro /System/Volumes/Data $mountpoint

Actually look at some diffs

Hokay, now I can mount a couple consecutive, "normal Time Machine" snapshots under /tmp/snapshot_{a,b}. Now what are the differences between them?

# rsync -HPrl --itemize-changes --dry-run /tmp/snapshot_{a,b}/ 2>&1 | tee /tmp/rsync-diff

This command runs the rsync diff algorithm recursively between two snapshots and outputs an itemization of the changes, one line per path, showing whehter "A" or "B" is newer or unchanged, that kind of thing.

While looking at the output, one thing that jumped out at me was I saw there were hundreds of .plist files from all over the filesystem that were showing up as having changed between consecutive snapshots. I picked one at random and looked at a before/after, and it looked like it had the same contents, but with keys in different orders.

You can plistutil --sort --print a .plist file and the --sort will print the file's contents in some canonical order. I did that on the two versions of the .plist file I had picked, and the results were identical.

Semantically-equivalent plists

Maybe that was a fluke, maybe I happened on an odd case... Let's do science. This takes the list of changed files, filters out the ones where the only apparent change was some timestamp, filters only for plist files, then for each plist file, run a diff between the canonically-ordered printout of both snapshots, echoing "same" if they are semantically unchanged and "different" otherwise:

$ cat /tmp/rsync-diff | \
    grep -v -F 'f..T......' | \  # filter out changes to files where only some timestamp changed
    grep -v -F 'd..T......' | \  # same with directories
    grep -v -F 'Operation timed out' | \
    grep plist\$ | \
    cut -c 11- | \               # only print the path to the plist file
    while read f; do
      echo -n "$f: "
      diff -q \                  # if same, exit 0.  if different, exit nonzero
        <(plistutil --sort --print "/tmp/snapshot_a/${f}") \
        <(plistutil --sort --print "/tmp/snapshot_b/${f}") \
        >/dev/null 2>&1 \
        && echo "same" || echo "different"
    done > /tmp/same-diff

So now /tmp/same-diff has a bunch of lines "<path/to/file.plist>: same" or "<path/to/file.plist>: different". What's the breakdown?

# grep same\$ /tmp/same-diff | wc -l
87
# grep different\$ /tmp/same-diff | wc -l
67

About 60% of them (87/(67+87)) were semantically the same but bytewise different. Add up all the bytes for the semantically-the-same files and:

$ s=0; </tmp/same-diff awk -F: '/same$/{print $1}' | \
    while read f; do
      du -sb "$f" 2>/dev/null
    done | \
    awk '{print $1}' | \
    while read n; do
      s=$((s + n))
    done; echo $s
5898552

That's 5.6MB worth of meaningless changes that get backed up regularly.

Is there a knob somewhere in MacOS that controls some framework, telling it to spend the extra CPU to use sorted property list representations and to always emit sorted plist files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment