Skip to content

Instantly share code, notes, and snippets.

@andris9
Created March 5, 2012 13:15
  • Star 66 You must be signed in to star a gist
  • Fork 23 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save andris9/1978266 to your computer and use it in GitHub Desktop.
git-cache-meta
#!/bin/sh -e
#git-cache-meta -- simple file meta data caching and applying.
#Simpler than etckeeper, metastore, setgitperms, etc.
#from http://www.kerneltrap.org/mailarchive/git/2009/1/9/4654694
#modified by n1k
# - save all files metadata not only from other users
# - save numeric uid and gid
# 2012-03-05 - added filetime, andris9
: ${GIT_CACHE_META_FILE=.git_cache_meta}
case $@ in
--store|--stdout)
case $1 in --store) exec > $GIT_CACHE_META_FILE; esac
find $(git ls-files)\
\( -printf 'chown %U %p\n' \) \
\( -printf 'chgrp %G %p\n' \) \
\( -printf 'touch -c -d "%AY-%Am-%Ad %AH:%AM:%AS" %p\n' \) \
\( -printf 'chmod %#m %p\n' \) ;;
--apply) sh -e $GIT_CACHE_META_FILE;;
*) 1>&2 echo "Usage: $0 --store|--stdout|--apply"; exit 1;;
esac

source:

git-cache-meta --store

destination:

git-cache-meta --apply

Download jgit.sh

Config

cat > ~/.jgit
accesskey: aws access key
secretkey: aws secret access key
<Ctrl-D>

Setup repo

git remote add origin amazon-s3://.jgit@bucket.name/repo-name.git

Push

jgit push origin master

Clone

jgit clone amazon-s3://.jgit@bucket.name/repo-name.git

Pull

jgit fetch
git merge origin/master
@JPT77
Copy link

JPT77 commented Nov 30, 2014

Well. It's getting complex. For me it did not really work.
If anyone has got trouble too, I recommend using: https://github.com/przemoc/metastore
Metastore seems to have some plans for the future.

btw. would be nice to put this script into a repo.

@ubaldot
Copy link

ubaldot commented Feb 4, 2015

I also don't have the -z option available in sed (I am using MinGw for Windows) :-(
Any workaround?

@PeterMosmans
Copy link

@Barzi2001,
Yes: use MSYS2 with its up-to-date version of mingw(64) and sed (currently at 4.2.2, so -z is supported)

Hope that helps...

@danny0838
Copy link

Here's a quick fix for the-mars' version:

  • Fix time zone detection. Use date +%z as a fallback if find -print %Tz gets an unset (empty) result.
  • Add ./ prefix for file names to prevent a leading-dash-name issue (rare, though; just in case).
  • Add -maxdepth 0 to avoid deeper find if a file/directory happens to be inexist.
  • Use awk post-replace to improve performance (by reducing mass ls calls).
  • Use git ls-tree to list all git versioned directories to improve performance and to avoid potential errors ("some.file" is added as a directory; "aaa/bbb/ddd.txt" doesn't make "aaa" added). This also eliminates the "No sed -z in MsysGit" issue since we no more use sed -z.
  • Merge short options.
#!/bin/sh -e

#git-cache-meta -- simple file meta data caching and applying.
#Simpler than etckeeper, metastore, setgitperms, etc.
#from http://www.kerneltrap.org/mailarchive/git/2009/1/9/4654694
#modified by n1k
#modified by the-mars
# - save all files metadata not only from other users
# - save numeric uid and gid
#2012-03-05 - added filetime, andris9
#2012-05-22 - added fix for non ASCII characters and list size, merge chgrp into chown command
#2014-03-18 - the-mars: store properties for dirs too
#2015-04-17 - time zone offset fallback; fix leading-dash-name error; avoid deeper find;
#              better quote file names; better directory listing; merge short opts; by Danny Lin

: ${GIT_CACHE_META_FILE=.git_cache_meta}
: ${Tz:=$(find -prune -printf '%Tz')}
: ${Tz:=$(date +%z)}
if ! [ "$Tz" ]; then
    echo "%z not supported in 'strftime' in C library." >&2
    exit 1
fi

case $@ in
    --store|--stdout)
    case $1 in --store) exec > $GIT_CACHE_META_FILE; esac
    { git ls-tree --name-only -rdz $(git write-tree) | xargs -0 -I NAME find ./NAME -maxdepth 0 \
        \( -printf 'chown -h %U:%G \0%p\n' \) , \
        \( \! -type l -printf 'chmod %#m \0%p\n' \) , \
        \( -printf 'touch -hcmd "%TY-%Tm-%Td %TH:%TM:%TS '$Tz'" \0%p\n' \) , \
        \( -printf 'touch -hcad "%AY-%Am-%Ad %AH:%AM:%AS '$Tz'" \0%p\n' \)
      git ls-files -z | xargs -0 -I NAME find ./NAME -maxdepth 0 \
        \( -printf 'chown -h %U:%G \0%p\n' \) , \
        \( \! -type l -printf 'chmod %#m \0%p\n' \) , \
        \( -printf 'touch -hcmd "%TY-%Tm-%Td %TH:%TM:%TS '$Tz'" \0%p\n' \) , \
        \( -printf 'touch -hcad "%AY-%Am-%Ad %AH:%AM:%AS '$Tz'" \0%p\n' \)
    } | awk 'BEGIN {FS="\0"}; {print $1 "'\''" gensub(/'\''/, "'\''\\\\'\'''\''", "g", $2) "'\''" }' ;;
    --apply) sh -e $GIT_CACHE_META_FILE;;
    *) 1>&2 echo "Usage: $0 --store|--stdout|--apply"; exit 1;;
esac

MsysGit (1.9.5) doesn't seem to support chown, chgrp, and touch -h, just remove them to be compatible. e.g.:

@@ -25,15 +25,13 @@
     --store|--stdout)
     case $1 in --store) exec > $GIT_CACHE_META_FILE; esac
     { git ls-tree --name-only -rdz $(git write-tree) | xargs -0 -I NAME find ./NAME -maxdepth 0 \
-        \( -printf 'chown -h %U:%G \0%p\n' \) , \
         \( \! -type l -printf 'chmod %#m \0%p\n' \) , \
-        \( -printf 'touch -hcmd "%TY-%Tm-%Td %TH:%TM:%TS '$Tz'" \0%p\n' \) , \
-        \( -printf 'touch -hcad "%AY-%Am-%Ad %AH:%AM:%AS '$Tz'" \0%p\n' \)
+        \( -printf 'touch -cmd "%TY-%Tm-%Td %TH:%TM:%TS '$Tz'" \0%p\n' \) , \
+        \( -printf 'touch -cad "%AY-%Am-%Ad %AH:%AM:%AS '$Tz'" \0%p\n' \)
       git ls-files -z | xargs -0 -I NAME find ./NAME -maxdepth 0 \
-        \( -printf 'chown -h %U:%G \0%p\n' \) , \
         \( \! -type l -printf 'chmod %#m \0%p\n' \) , \
-        \( -printf 'touch -hcmd "%TY-%Tm-%Td %TH:%TM:%TS '$Tz'" \0%p\n' \) , \
-        \( -printf 'touch -hcad "%AY-%Am-%Ad %AH:%AM:%AS '$Tz'" \0%p\n' \)
+        \( -printf 'touch -cmd "%TY-%Tm-%Td %TH:%TM:%TS '$Tz'" \0%p\n' \) , \
+        \( -printf 'touch -cad "%AY-%Am-%Ad %AH:%AM:%AS '$Tz'" \0%p\n' \)
     } | awk 'BEGIN {FS="\0"}; {print $1 "'\''" gensub(/'\''/, "'\''\\\\'\'''\''", "g", $2) "'\''" }' ;;
     --apply) sh -e $GIT_CACHE_META_FILE;;
     *) 1>&2 echo "Usage: $0 --store|--stdout|--apply"; exit 1;;

MsysGit don't support find -print %Tz, either. However, while it supports date +%z, my patch with this fallback works.

Still another issue is that MsysGit doesn't support touch using a timestamp with fractional seconds. If the repo works only on MsysGit, it would work fine since MsygGit's %TS and %AS writes no fractional seconds. However if the .git_cache_meta has been created on a system that writes fractional seconds, an error would occur when it's being applied on MsysGit.

Many platforms and softwares just ignore the fractional seconds. To make the script more platform-free, we could add a replace command to pre-exclude the fractional seconds. For example:

@@ -34,7 +34,8 @@
         \( \! -type l -printf 'chmod %#m \0%p\n' \) , \
         \( -printf 'touch -hcmd "%TY-%Tm-%Td %TH:%TM:%TS '$Tz'" \0%p\n' \) , \
         \( -printf 'touch -hcad "%AY-%Am-%Ad %AH:%AM:%AS '$Tz'" \0%p\n' \)
-    } | awk 'BEGIN {FS="\0"}; {print $1 "'\''" gensub(/'\''/, "'\''\\\\'\'''\''", "g", $2) "'\''" }' ;;
+    } | awk 'BEGIN {FS="\0"}; {print $1 "'\''" gensub(/'\''/, "'\''\\\\'\'''\''", "g", $2) "'\''" }' |
+        sed -r 's!^(touch -[a-z]* "[0-9 :+\-]+)(\.[0-9]+)? !\1 !';;
     --apply) sh -e $GIT_CACHE_META_FILE;;
     *) 1>&2 echo "Usage: $0 --store|--stdout|--apply"; exit 1;;
 esac

@bizonix
Copy link

bizonix commented May 7, 2015

for Mac OS X, brew install findutils gawk coreutils

#!/bin/sh -e

#git-cache-meta -- simple file meta data caching and applying.
#Simpler than etckeeper, metastore, setgitperms, etc.
#from http://www.kerneltrap.org/mailarchive/git/2009/1/9/4654694
#modified by n1k
#modified by the-mars
#modified by bizonix
# - save all files metadata not only from other users
# - save numeric uid and gid
#2012-03-05 - added filetime, andris9
#2012-05-22 - added fix for non ASCII characters and list size, merge chgrp into chown command
#2014-03-18 - the-mars: store properties for dirs too
#2015-04-17 - time zone offset fallback; fix leading-dash-name error; avoid deeper find;
#              better quote file names; better directory listing; merge short opts; by Danny Lin
#2015-05-07 - for Mac OS X, `brew install findutils gawk coreutils`

: ${GIT_CACHE_META_FILE=.git_cache_meta}

if [[ "$OSTYPE" == "darwin"* ]]; then
    GNU='g'
fi
for bin in find touch awk ; do
    BIN=$( echo $bin | tr '[:lower:]' '[:upper:]')
    eval ': ${'$BIN':=$(which $GNU$bin)}'
    if [ "$GNU" == 'g' ] && ! [[ "${!BIN}" =~ /$GNU$bin ]]  ; then
        echo "gnu version of '$bin' file not found." >&2
        exit 1
    fi
done

: ${Tz:=$($FIND -prune -printf '%Tz')}
: ${Tz:=$(date +%z)}
if ! [ "$Tz" ]; then
    echo "%z not supported in 'strftime' in C library." >&2
    exit 1
fi

case $@ in
    --store|--stdout)
    case $1 in --store) exec > $GIT_CACHE_META_FILE; esac
    { git ls-tree --name-only -rdz $(git write-tree) | xargs -0 -I NAME $FIND ./NAME -maxdepth 0 \
        \( -printf 'chown -h %U:%G \0%p\n' \) , \
        \( \! -type l -printf 'chmod %#m \0%p\n' \) , \
        \( -printf $TOUCH' -hcmd "%TY-%Tm-%Td %TH:%TM:%TS '$Tz'" \0%p\n' \) , \
        \( -printf $TOUCH' -hcad "%AY-%Am-%Ad %AH:%AM:%AS '$Tz'" \0%p\n' \)
      git ls-files -z | xargs -0 -I NAME $FIND ./NAME -maxdepth 0 \
        \( -printf 'chown -h %U:%G \0%p\n' \) , \
        \( \! -type l -printf 'chmod %#m \0%p\n' \) , \
        \( -printf $TOUCH' -hcmd "%TY-%Tm-%Td %TH:%TM:%TS '$Tz'" \0%p\n' \) , \
        \( -printf $TOUCH' -hcad "%AY-%Am-%Ad %AH:%AM:%AS '$Tz'" \0%p\n' \)
    } | $AWK 'BEGIN {FS="\0"}; {print $1 "'\''" gensub(/'\''/, "'\''\\\\'\'''\''", "g", $2) "'\''" }' ;;
    --apply) sh -e $GIT_CACHE_META_FILE;;
    *) 1>&2 echo "Usage: $0 --store|--stdout|--apply"; exit 1;;
esac

@arno01
Copy link

arno01 commented Oct 19, 2015

Hi all,
Thanks to everyone for great additions above!
I decided to share my version of this script. The idea is to keep it as simple as possible.

#!/bin/sh -e

# git-cache-meta -- simple file meta data caching and applying.
# Simpler than etckeeper, metastore, setgitperms, etc.
# from http://www.kerneltrap.org/mailarchive/git/2009/1/9/4654694
# modified by n1k
#  - save all files metadata not only from other users
#  - save numeric uid and gid

# Changes by: Andrey Arapov <andrey.arapov@nixaid.com>
#   2015-10-16 - add '-h' flag to chown, chgrp and touch so that symlink is
#                NOT followed
#              - chmod cannot be applied to symlink
#              - add "--" to stop processing arguments (e.g when file name has
#                leading "-")
#   2015-10-14 - added quotes around path %p

# Initial release by andris9
#   2012-03-05 - added filetime, andris9

: ${GIT_CACHE_META_FILE=.git_cache_meta}
case $@ in
    --store|--stdout)
    case $1 in --store) exec > $GIT_CACHE_META_FILE; esac
    find $(git ls-files)\
        \( -printf 'chown -h %U -- "%p"\n' \) \
        \( -printf 'chgrp -h %G -- "%p"\n' \) \
        \( -printf 'touch -h -c -d "%AY-%Am-%Ad %AH:%AM:%AS" -- "%p"\n' \) \
        ! -type l \( -printf 'chmod %#m -- "%p"\n' \) ;;
    --apply) sh -e $GIT_CACHE_META_FILE;;
    *) 1>&2 echo "Usage: $0 --store|--stdout|--apply"; exit 1;;
esac

@heaversm
Copy link

This is failing for me on a mac, since mac does not seem to support the -printf parameter.

@cmw
Copy link

cmw commented Jan 5, 2016

@heaversm: arno01's version doesn't work for me either, but bizonix' does.

@danny0838
Copy link

danny0838 commented Jun 8, 2016

I created another project git-store-meta, which is written in Perl and a bit more complicated but has better performance, flexibility, security, and cross-platform compatibility, while it still keeps very light dependency.

@undergroundSimplex
Copy link

@danny0838's work is great, exception from that I would change chown -h %U:%G to chown -h %u:%g in case that UID and GID is different for the same user/group on different machine

@danny0838
Copy link

@undergroundSimplex: Thank you for the feedback. I agree with you. It's actually the behavior that git-store-meta has implemented.

@xspython
Copy link

../git-cache-meta.sh --store
find: 'usr/lib/firmware/brcm/brcmfmac43430a0-sdio.ONDA-V80': No such file or directory
find: 'PLUS.txt': No such file or directory
find: 'usr/lib/firmware/brcm/brcmfmac43455-sdio.MINIX-NEO': No such file or directory
find: 'Z83-4.txt': No such file or directory
find: 'usr/lib/python3.6/site-packages/setuptools/command/launcher': No such file or directory
find: 'manifest.xml': No such file or directory
find: 'usr/lib/python3.6/site-packages/setuptools/script': No such file or directory
find: '(dev).tmpl': No such file or directory

ls usr/lib/firmware/brcm/brcmfmac43430a0-sdio.ONDA-V80\ PLUS.txt
'usr/lib/firmware/brcm/brcmfmac43430a0-sdio.ONDA-V80 PLUS.txt'

@danimesq
Copy link

danimesq commented Oct 16, 2021

It is wonderful the collaboration level the humans can naturally organize 🥰
@andris9 have started this and y'all started appending into it, respecting/including the previous contributions
All of that not on a git repository, but providing a frugal way into a gist!

In case of any interest/need (and personally this script've helped me as I have a compulsion for preserving dates), here is a unified repository with all of your contributions! 🎉🥳 (🎊 look like a beach bikini)

https://github.com/Floflis/git-meta

@AntonioMeireles @brayrobert201 @stefanbj @Explorer09 @Cojad @the-mars @mkortleven-emg @danny0838 @bizonix

Sorry @arno01 for not including yours, as it seemed very incomplete and @cmw reported it didn't worked.

@danimesq
Copy link

Also mentioning other participants who may have interest: @kickiss @pvdputte @delormec @JPT77 @Barzi2001 @PeterMosmans @heaversm @undergroundSimplex @xspython

@andris9
Copy link
Author

andris9 commented Oct 16, 2021

Completely forgot that this thing even exists 😀

@danny0838
Copy link

danny0838 commented Oct 16, 2021

Though it's nice to have revisions integrated, I have been shifted to git-store-meta for many years, as it's much more performant, secure, and supports more features. 😅

@Explorer09
Copy link

Hello @DaniellMesquita,
Thanks for integrating the code, but I didn't have time at the moment to test the new one out.
However at the first glance of the new code, it seems like it does not meet the portability that I expected yet. The first issue I found is that your version includes bashisms and the shebang line says /bin/sh and not /bin/bash. Either the script should be POSIX compatible, or it should use /bin/bash as the shebang.
If I see another issue, I might report it later.

@danimesq
Copy link

@andris9

Completely forgot that this thing even exists 😀

Congrats on starting this project.
If in the future there is a layer2 for git, for sure this will be as useful as it was for me when I was needing.

@danny0838

Though it's nice to have revisions integrated, I have been shifted to git-store-meta for many years, as it's much more performant, secure, and supports more features. 😅

Haven't heard of it. Could you share?

@Explorer09

Thanks for integrating the code, but I didn't have time at the moment to test the new one out.
However at the first glance of the new code, it seems like it does not meet the portability that I expected yet. The first issue I found is that your version includes bashisms and the shebang line says /bin/sh and not /bin/bash. Either the script should be POSIX compatible, or it should use /bin/bash as the shebang.

Despite looking very bashy, it works with a simple ./git-meta.sh --store

Should I still change?

If I see another issue, I might report it later.

Thank you. You're very welcome, as the first to report issues when this project has been started.

@danny0838
Copy link

@DaniellMesquita As above mentioned.

@danimesq
Copy link

@danny0838

If you port it from Perl to Rust, then I'll switch to it.

@danny0838
Copy link

danny0838 commented Oct 17, 2021

@DaniellMesquita I don't write Rust. Even if I do, I won't recommend that.

The purpose of using Perl is because Perl is a component of Git core (normally) and must be supported by any platform that can run Git.

Using another famous language will inevitably introduce an additional dependency and make installation more difficult.

@Explorer09
Copy link

@DaniellMesquita As I said, either make the script POSIX compatible or use /bin/bash as the shebang.
Not everyone uses bash as the default shell and you can run into compatibility problems.
Which choice to make is up to you. It's your project so you can make up your own policy.

@danimesq
Copy link

@danny0838

The purpose of using Perl is because Perl is a component of Git core (normally) and must be supported by any platform that can run Git.

Using another famous language will inevitably introduce an additional dependency and make installation more difficult.

It makes sense.

Will git-store-meta support git hooks to automatically version changes in file metadata on every commit?

@danimesq
Copy link

@Explorer09

As I said, either make the script POSIX compatible or use /bin/bash as the shebang.
Not everyone uses bash as the default shell and you can run into compatibility problems.
Which choice to make is up to you. It's your project so you can make up your own policy.

Democracy is way better than seeing this community effort as "my project".

Done: 01VCS/git-meta@810d5ff

And issues/PRs are welcome.

@danny0838
Copy link

@DaniellMesquita

Will git-store-meta support git hooks to automatically version changes in file metadata on every commit?

Yes. Read the manual for details, bro.

@danimesq
Copy link

danimesq commented Oct 18, 2021

@danny0838

Yes. Read the manual for details, bro.

Interesting. An native version, in the same language as git, makes more sense.

Although I'll personally stick with the sh/bash version for simplicity (and for diversification).

Now it can be initiated in a repo for performing metadata versioning on every commit: 01VCS/git-meta@cf30ef0 (automatically)

Next step, maybe, is having an individual git repository for the metadata, inside .git/meta (will make things more organized and magical)?

@Arcitec
Copy link

Arcitec commented Oct 5, 2022

I was inspired by the script but saw some severe issues in it.

  1. %p is just the path without any special quoting of special characters in filenames, such as leading - which would be interpreted as a parameter, or spaces in the filename which would be interpreted as separate parameters. This breaks SEVERELY if the filenames are weird in any way whatsoever.
  2. The %A is the ACCESS TIME of the file. Why the F is it being tracked? I think you meant to use %T which is the last MODIFICATION TIME of the file. Most people these days don't even use access times anymore, and disable them completely or make them relative to some other time. It's definitely NOT what you intended to copy over.
  3. You're writing the time in human format while totally ignoring a little thing known as TIME ZONES. The dates it restores will be totally wrong.
  4. Why on earth are you using chmod at all? GIT PRESERVES FILE MODE BITS ALREADY! At least if core.filemode in the Git config is true, which it is by default. It makes NO SENSE to save modification bits via your script. It's pointless.
  5. You're grabbing %U and %G which are the NUMERIC USER/GROUP IDs. You should be using %u and %g which are the HUMAN-READABLE user/group, which is way more portable to other machines.
  6. Instead of outputting separate chown and chgrp commands, you should output ONE chown command, since it's able to take chown user:group -- FILE as parameters.
  7. Speaking of --... You should be using -- in every command, to tell them that there are no more flags, and that the rest of the command is the arguments. This is necessary to avoid the risk of filenames being interpreted as parameters.
  8. The "metadata restoration script" you generate has no error-checking whatsoever. So it gives a false sense of security, since it runs but might fail to do anything, but it will just happily continue executing all lines even if there are severe errors (such as not having any write-permissions to the directory it's running in).
  9. Pretty much all of the "variants" above suffer the exact same bugs.

Anyway that's just a few of the issues, there are probably more, but I was only really focused on the "file time" aspect which is what I am interested in saving/restoring in my repo...

So, I was investigating how to rapidly produce quoted (safe) filenames, in universal UNIX TIME format.

The following techniques are what I came up with:

  1. TERRIBLE: find . -type f -printf '%p %T@\n': This outputs the %T@ (Unix modification time with milisecond precision, which on most filesystems leads to trailing .000000). The reason it's terrible is because %p is not quoted, and because the trailing zeroes after every timestamp is just stupid and wasteful.
  2. TERRIBLE: find . -type f -printf '%P %T@\n': Almost the same as the previous one, but I wanted to mention that %P outputs the paths without the leading folder (the . argument in this case), which is very useful if you're trying to be portable. But we still have the HUGE issue that filenames are not quoted. And no, we can't simply slap "%P" around it, since quoting DOESN'T WORK THAT WAY.
  3. KINDA GOOD: stat --printf='touch -mcd "@%Y" -- %N\n' **/*: Alright, now we're getting somewhere. This uses stat which supports %N which is the properly quoted/escaped path to the file. And its %Y outputs the Unix timestamp without ridiculous trailing milliseconds. That's a pretty nice evolution. But the globbing **/* is bad because it CAN'T HANDLE INVISIBLE FILES and also grabs every file and FOLDER, rather than just files.
  4. GREAT BUT SLOW: cd "somefolder" && find . -type f -exec stat --printf='touch -mcd "@%Y" -- %N\n' "{}" \; && cd ..: Alright this is getting close to perfection. It enters a folder, uses find to only look at files, executes stat on the file to get the Unix timestamp and quoted filename. So why is it bad? Well, it's super slow due to spawning stat once per file. Even small collections take a long time. But we can improve this...
  5. PERFECTION: cd "somefolder" && find . -type f -print0 | xargs -0 stat --printf='touch -mcd "@%Y" -- %N\n' && cd ..: With this we've finally achieved perfection. We're using find to discover all files rapidly, and since we're using find you can add other conditions like "all files ending in .x" or "skip all files named foobar"), and the -0 argument is used to output them with NULL separator (so that we support complex filenames, including spaces and even special characters such as newlines in the filename). Next, we use xargs with NULL separator to pass ALL of the discovered files SIMULTANEOUSLY into ONE execution of stat. This gives INSTANT RESULTS, which are all perfectly formatted and escaped.

TL;DR: Solution 5 is BY FAR the best way to back up modification times of files.

Oh and if you're wondering how we're setting the date: Type man date to read about supported DATE formats. Specifically, we're using Unix timestamps which are supported by prepending an @ before the numbers, as seen in this DATE manual example:

Convert seconds since the Epoch (1970-01-01 UTC) to a date

$ date --date='@2147483647'

Here's the "core" of what we're going to do:

cd "Parent Folder" && find . -type f -not -name "metadata-cache" -print0 | sort -z | xargs -0 stat --printf='touch -mcd "@%Y" -- %N\n' > "./metadata-cache" && cd ..

This enters the parent folder to ensure that all paths become relative to that parent. This is actually the full path to my parent folder, I just changed it to "Parent Folder" for this demo.

Next, it lists all regular files except any named "metadata-cache", to avoid listing the cache itself.

Then it sorts the NUL-terminated filenames to ensure that they end up in a nice order (this just makes the metadata file easier to diff and compare).

Then it executes "stat" to safely print their UNIX timestamp commands and their quoted paths.

It pipes that output into a file named "metadata-cache" which ends up inside the parent folder.

The end result is a very clean file which can now be executed to apply all modification times, when necessary.


The reason for && between all commands is to make the sequence fail with an error if any part of the command-chain fails. This means that you can check $? after executing this one-liner, to see if any part of the chain failed. So if [[ $? -ne 0 ]]; then echo "OH NO IT FAILED"; fi means there was an error.

But you MUST do that check immediately after this one-liner, because if you run any other commands first, then the value of $? (last command status) will change. Keep that in mind! :)

Also keep in mind that if the chain fails before the cd .. then you will be stuck in the "Parent Folder" location that you cd-ed into. But personally I don't care since my script will exit if any part failed.

But... to make things even better, it's possible to save the result of $PWD (Bash's always-up-to-date pwd equivalent variable), before we cd at all, which will allow us to restore the current working dir at the end no matter where you came from originally. That's what we'll do in the final functions below.

Final, reliable functions, hereby placed in the Public Domain:

#!/usr/bin/env bash

function write_metadata() {
    # Writes a robust metadata file containing sorted, fully-escaped paths, with
    # the full UNIX modification timestamp of each file.
    CURRENT_PWD="${PWD}"
    cd "${WHATEVER_DIR}" && find . -type f -not -name "metadata-cache" -print0 | sort -z | xargs -0 stat --printf='touch -mcd "@%Y" -- %N || exit 1\n' > "./metadata-cache"
    if [[ $? -ne 0 ]]; then echo "Error while writing metadata cache. Aborting..."; exit 1; fi
    cd "${CURRENT_PWD}"
    if [[ $? -ne 0 ]]; then echo "Error while accessing previous working directory. Aborting..."; exit 1; fi
}

function read_metadata() {
    # Applying the metadata again is a simple matter of going into the target
    # folder if it exists, and then executing the metadata file as a script.
    if [[ ! -f "${WHATEVER_DIR}/metadata-cache" ]]; then return 0; fi
    CURRENT_PWD="${PWD}"
    cd "${WHATEVER_DIR}" && env bash -- "./metadata-cache"
    if [[ $? -ne 0 ]]; then echo "Error while reading metadata cache. Aborting..."; exit 1; fi
    cd "${CURRENT_PWD}"
    if [[ $? -ne 0 ]]; then echo "Error while accessing previous working directory. Aborting..."; exit 1; fi
}

The "${WHATEVER_DIR}" is just whatever folder you want to scan/restore. Replace that with whatever your own variable is called, where you store the path to the target directory.

If you want to make things harder for yourself, you may even decide to make the functions modular by taking $1 as a dynamic parameter of what directory to scan. But then you'll have to call the function with parameters everywhere in your code, so the choice is yours. :) I personally don't think anyone is gonna need modularity enough to warrant all the risks/drawbacks of taking a random parameters instead, so I went with the hardcoded path variables.

One thing to be aware of is that we're using if [[ ... ]]; then ...; fi instead of the [[ ... ]] && { ... } shorthand that many people like, because the shorthand is treated as a command rather than an if-statement, and Bash functions will automatically return the value of the last executed command, which will be false (if everything was successful and the "check for errors" "failed"), which therefore looks like the function gave an error even though it didn't. So to avoid having a return-value bug, we must explicitly use if-statements in all checks in our functions.

There's another small but important thing to be aware of in read_metadata(): The metadata restoration script is executed in a sub-shell, and the metadata-cache script is written to contain || exit 1 after each statement, so that it exits with an error code if there were any errors in any of the commands. This means that we'll detect if anything went wrong while restoring metadata. But if you prefer letting the metadata file ignore errors, you can remove that part of the lines created by write_metadata(). :) However, you most likely WANT to keep these error checks, because it lets you discover when stat found a file but failed to modify its timestamp (such as lacking permissions to modify the folder). If a file doesn't exist, stat simply returns success, so you don't have to worry about missing files triggering those error handlers. They will only trigger on actual errors with the metadata restoration process!

Enjoy!

@danimesq
Copy link

Hi @Bananaman! In what version your implementation is based off?

@spoelstraethan
Copy link

@Bananaman if you used pushd "${WHATEVER_DIR}" >/dev/null and then popd >/dev/null if any error occurred, you would still end up in the "original" directory without the extra output messages from those commands polluting the output.

@Explorer09
Copy link

Explorer09 commented Jan 10, 2023

@spoelstraethan

if you used pushd "${WHATEVER_DIR}" >/dev/null and then popd >/dev/null if any error occurred, you would still end up in the "original" directory without the extra output messages from those commands polluting the output.

Just to inform that pushd and popd are bashism and thus unportable. There is one more problem with pushd is that you do need to handle when pushd itself fails (e.g. when directory doesn't exist), and if you are not careful you would end up popping one more directory from the stack than needed (which could become a security vulnerability in certain applications).

If you need to return to the original directory, the safest as well as the simplest approach is to cd in a subshell, and exit the subshell when you are done or an error occurred in that directory. Like this:

func1 () {
    # Process current directory
    (
        set -e
        cd "$new_dir"
        # Process $new_dir
        # When an error occurs, this subshell exits because of the "set -e" command
    ) || return
    # Back to the directory that was current when entering the function
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment