Skip to content

Instantly share code, notes, and snippets.

Forked from tvwerkhoven/
Created May 18, 2020 09:34
Show Gist options
  • Save jspiro/9b1661d134b7949f62ee682c9bccd426 to your computer and use it in GitHub Desktop.
Save jspiro/9b1661d134b7949f62ee682c9bccd426 to your computer and use it in GitHub Desktop.
De-duplicate using APFS clonefile(2) and jdupes in zsh
#!/usr/bin/env zsh
# # About
# Since APFS supports de-duplication on block-level, it can be useful to
# manually de-duplicate your files if you've migrated/upgrade to APFS not
# using a fresh install.
# I've written this simple script with the aim to:
# - Be simple, easy to read and understand (for users to check)
# - Use native cp -c for de-duplication
# - Use non-hashing file comparison to prevent collisions
# - To use jdupes for speed
# - Preserve file metadata
# # Known bugs
# - Does not preserve target directory timestamps
# # Background info
# # Alternatives (
# Python, uses hashes (collision risk):
# Python, uses hashes (collision risk, does not preserve metadata?):
# Does not preserve metadata:
# Paid:
# Paid:
### Init: identify files and programs
# File to hold duplicate file data
# File to temporarily store old file for metadata
# Critical programs to use
PCP=/bin/cp # Should be Mac native cp supporting clonefile(2)!
PGCP=/opt/local/bin/gcp # Not be confused with alias for git cherry-pick
test ! -x "${PCP}" && echo "Error: path to cp wrong" && exit
test ! -x "${PMV}" && echo "Error: path to mv wrong" && exit
test ! -x "${PGCP}" && echo "Error: path to gnu-cp wrong" && exit
test ! -x "${PJDUPES}" && echo "Error: path to jdupes wrong" && exit
### Optional: check how much data can be saved
${PJDUPES} --recurse --omitfirst ./ | tee ${DUPEFILE}
# Loop over lines, if line is not empty, check size, sum in awk
cat ${DUPEFILE} | while read thisfile; do
test ! -z $thisfile && du -k "$thisfile"
done | awk '{i+=$1} END {print i" kb"}'
### Find duplicates
# Find duplicates, use NUL character to separate to allow for newlines in
# filenames (rare but possible).
${PJDUPES} --printnull --recurse ./ | tee ${DUPEFILE}
# Check number of sets of duplicates by counting occurence of two consecutive
# NUL characters.
# Count number of NUL characters in file Source:
NPAIRS=$(grep -oaE '\x00\x00' ${DUPEFILE} | wc -l)
echo "Found ${NPAIRS} sets of duplicates"
### Start de-duplication
# Loop over files separated by NUL characters, use first file of paired
# filenames as source for all other files in this set, e.g.
# file1\x00
# file2\x00
# file3\x00\x00
# will cause file2 and file3 to be overwritten by file1
# - If the file is empty, a new set will begin and we will unset SOURCEFILE.
# Also true for the first set we will encounter as SOURCEFILE starts unset
# - If SOURCEFILE is unset, use the current file to set this
# - If the file is not empty AND SOURCEFILE is set, make a copy:
# -- Move the target file to a new temporary location
# -- Clone the source file over the target file
# -- Copy attributes from source file to target file
cat ${DUPEFILE} | while read -d $'\0' FILE; do
if [[ -z $FILE ]]; then
elif [[ -z $SOURCEFILE ]]; then
# Presever original file for metadata
${PMV} "${FILE}" "${TEMPFILE}";
# Test that move was successful
test ! -e "${TEMPFILE}" && echo "Error: move failed, aborting." && break
# Use cp -c to use APFS clonefile(2)
# Use cp -a to preserve metadata, recurse, and not follow symlinks
${PCP} -ca "${SOURCEFILE}" "${FILE}";
# Test that copy was successful (protect against e.g. empty $PCP string)
test ! -e "${FILE}" && echo "Error: copy failed, aborting." && break
# Use gnu copy to copy over all attributes
# Poorer alternative:
${PGCP} --preserve=all --attributes-only "${TEMPFILE}" "${FILE}";
## Usin fdupes - bash (not tested)
# Get matches
# DUPEFILE=fdupes-20200101a
# fdupes --sameline --recurse ./ | tee ${DUPEFILE}
# cat ${DUPEFILE} | while read SOURCEFILE DESTFILES; do
# # Split lines by spaces
# # Source
# for DEST in "${DESTFILESARR[@]}"; do
# mv "${DEST}" tmp
# echo cp -ca "${SOURCEFILE}" "${DEST}";
# echo gcp --preserve=all --attributes-only tmp "${DEST}"
# done
# done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment