Skip to content

Instantly share code, notes, and snippets.

@rolfen rolfen/0000 Readme.md
Last active May 8, 2019

Embed
What would you like to do?
Media Archive

My Media Archive

Life without Lightroom

Backups

There is a bit of dilemma here, between the options;

  • Snapshot backups with automatic deduplication

    • Optimal use of transfer and storage capacity
    • Can make periodic snapshots
    • Can be encrypted (privacy)
  • rsync-type simple mirrors

    • No need for time-consuming software
    • No need to install special client
    • May be easier to browse and retrieve single files (depends)
    • Can also double as media repository in the cloud

Anti-corruption

Content Checksum

Well one easy way is to keep a checksum of every file.

sha1sum P0123.ORF

The challenge here is that metadata (Exif) may change (fixing date, adding tags, etc, which will make the checksum invalid.

So it would make more sense to checksum only the RAW image (or video) data which should never change.

How do we extract this data?

Apparently ImageMagick has a mogrify tool which can be of great use here.

ImageMagick tools can produce some sort of checksum however it is very slow.

Here is a faster method which makes use of exiftool to extract the RAW blob and checksum it:

exiftool FILE -all= -o - | sha1sum

Source: http://u88.n24.queensu.ca/exiftool/forum/index.php?topic=2734.0

Detecting corruption

To detect corruption, we need to keep records of the checksums somewhere, and then periodically re-calculate the checksum and compare it with the previously stored one.

We could have a database which keeps all the different values which make a file unique. So for each file we have (assuming applicable):

  • Timestamp: This should be unique in "99%" of the cases, however it is conceivable that two different photos would have the same timestamp, either because they were taken in the same second, or because they date was wrongly set.
  • Filename: This can globally clash, but filenames must be unique within a particular day
  • Content checksum: This should be unique, however different content can have the same checksun in extremely rare cases.

Now let's say we have this database, and files on the disk. We scan the files (including recalculating checksums) and compare the results to the database records, to detect corruption or changes.

Possible situations:

  • For a file identified by a particular Timestamp and Filename combination, the content checksum has changed: corruption has been detected.
  • For a file identified by a particuar content checksum, Timestamp and/or Filename have changed: metadata has been updated, database should be updated.

Organizing photos by date

This reads EXIF date taken and recursively moves photos (from \MyMess) to an organized directory structure such as YYYY\MM\DD

exiftool is requried. It is available on Windows, Linux and OSX. The following is tested on Windows, should be easy to adapt to other platforms.

Here is the procedure:

cd to your target directory.

Take a snapshot of the tree before moving files because it might contain useful information:

find .  > "Archive/treesnapshot.txt"

When find is not available on Windows, here is another option:

tree /f /a > "\Archive\treesnapshot.txt"

Organize all files according to creating date. If this fails, put in "Unknown". Put Photos (with sidecar files) and Videos in two separate target directories. When we cannot determine date and time, then drop photo in a subdirectory inside the "Unknown" directory.

Here is how we do it:

Caution!

Please specifiy the source directory (here F:\MyMess) by a full path. No ../../../MyMess, please!


cd Photos
exiftool -DateTimeOriginal -CreateDate -r -m -d %Y/%m/%d "-Directory=Unknown/${directory;s/[a-z]://i}" "-Directory<CreateDate" "-Directory<DateTimeOriginal" -ext JPG -ext ORF -ext DNG -ext JPEG -ext XMP -ext PP3 "F:\MyMess"

# For videos, just CD to the Video archive and change the extensions:
cd ..
cd Videos
exiftool -DateTimeOriginal -CreateDate -r -m -d %Y/%m/%d "-Directory=Unknown/${directory;s/[a-z]://i}" "-Directory<CreateDate" "-Directory<DateTimeOriginal" -ext MP4 -ext AVI -ext MOV -ext MPEG "F:\MyMess"

Delete (now) empty directories

cd F:\MyMess
find -empty -type d -print0|xargs -0 rmdir

Keep doing this until nothing is found, because the command above only deletes one level at a time

Leftovers are either unsupported by exiftool or contain no EXIF data or are already present at the destination.

Notes

  • Pay attention to double quotes and backslashes preceding double quotes for shell escaping and other weirdness.
  • -m is for "ignore minor errors and warnings" so that exiftool does not "fix" files on my behalf! (Really? Silently fix things?)
  • "Directory for each image is ultimately set by the rightmost copy argument that is valid for that image" (source: https://sno.phy.queensu.ca/~phil/exiftool/filename.html).
  • -CreateDate is a performance improvement. It means that we only read the creationdate information and ignore the other EXIF (etc.) information. it goes much faster.
  • For multiple input directories, replace "F:\MyMess" by a @- C:\directories.txt (for example), directories.txt being a text file containing a list of all directories that you want to import from, one entry per line.
  • Exiftool accepts \ or / directory seprators, depending on the environment.

Extracting JPEG Previews

JPEG files are much easier to browse than RAW images and faster to share.

Usually the camera produces JPEG images along the raw (ORF in this case) images. However the JPEGs are missing for some reason, they can quickly be extracted from the ORF.

cd to a directory which contains ORF images. The following will do the extraction recursively and save the JPG file next to the ORF but it will not overwrite any JPG file which is already there.

exiftool  -m -r . -b -previewimage -ext ORF -w .JPG 2>/tmp/exiftool.log

Delete /tmp/exiftool.log when done.

The procedure to insert photo into the Archive:

  1. Optional: copy them to the "import" directory
  2. Deduplicate them against The Archive, meaning that any photo which already exist in the archive (bit by bit) should be deleted in the "import" directory.
  3. Use exiftool or similar to move files into the right directory in the archive (see Organizing Photos by Date)
  4. Any remaining files should be manually checked

Steps 2 and 3 can be swapped in the sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.