The following applies to the scripts located here.
Once DCMR has finished the QC/QA of all the files, they will inform MSD (or Robin, who will then inform MSD?) that their work is complete. At this point it is time to process the files according the [HathiTrust Cloud Validation and ingest requirements] (https://docs.google.com/document/d/1OQ0SKAiOH8Xi0HVVxg4TryBrPUPtdv4qA70d8ghRltU/edit).
#####Step 1: Remove all unnecessary files
The only files that should exist in each directory as this work begins are .tif and .txt files (the scanned images and the text ocr). To remove all of the .jpg files, navigate in the terminal to the directory that contains ALL of the subdirectories (as in, the directory that contains folders for each barcode) and run:
find . -name "*.jpg" -type f