MADS has responsibility for uploading born digital objects to the Isilon that are acquired as part of an archival collection. This is usually in the form of hard drives and similar media that they want complete copies sent to the Isilon as part of the curation process.
Existing Project Proposal is in Confluence - https://tulibdev.atlassian.net/wiki/spaces/TPI/pages/864321537/Project+Proposal+Hard+Drive+Ingestion+Script
Currently, MADS (Stefanie) mounts the Isilon to her machine and then mounts the HD to
her MAC via a write blocker and then runs an rsync command to recursively cpy the entire filesystem.
The rsync was provided to her by Jim Bongiovani. Prior to that there was an extensive bash script, written by
Kate Lynch, that was being used, but Stefanie has never able to get it working. I don't
believe those old scriptts were even kept in version control anywhere.
The biggest problems with the current command is that
- it is brittle; e.g. rsync command line flags suck
- it feels dangerous - rsync can have bad effects if small flag mistakes are made
- Doesn't work on Windows
- Doesn't even do bare minimum digital preservation - checksums on both sides
A project proposal has been submitted to build a replacement that meets the following requirements:
- Easier to use. CLI ok, but avoid arcane flags with reasonable defaults
- Generates checksums all the files intended to be transferred before transfer, and then double checks the checksum on the destination.
- Has actual docuemtation about how it works, how it can be installed, effects
- Works on Mac and Windows
- Is supported by devs / LTS. I take this to mean we actually write tests for it so we can confidently add features in the future
It seems like we coudl use something like Bagit to avoid bikeshedding on how to do the checksumming. Also seems like a great opportunity to write in a language that compiles to a binary, like Go, to make supporting OSX and Windows easier.