Skip to content

Instantly share code, notes, and snippets.

@gboudreau
Last active February 29, 2016 15:24
Show Gist options
  • Save gboudreau/734a80848486a9dd0e2e to your computer and use it in GitHub Desktop.
Save gboudreau/734a80848486a9dd0e2e to your computer and use it in GitHub Desktop.
Script to fix files copied from a dying hard drive that would sometimes read bytes as 0x10 less than what they are. Ref: https://www.pommepause.com/2016/02/the-case-of-the-dying-hard-drive-that-flipped-bits/
<?php
/**
* Use this script to check and fix a list of files that were copied, using rsync, from a bad (BAD!) hard drive to another hard drive (let's call that one the savior drive).
* Since the bad drive is so bad, the data that was copied off that drive might have been corrupted.
* So we'll find which of the listed files are wrong (different MD5 checksum on both drives), and fix them.
*
* Lucky for us, the dying drive is SO BAD that it never generates the same read errors, and when it does generate errors, the byte it reads is always exactly 0x10 (decimal 16) less than it should be.
* This very particular way of dying allows us to detect which of the two drive has the correct byte, and thus write a correct file on the savior drive.
* Hooray!
*
* Usage: specify in argument the path to the file that lists all the files that should be verified and fixed.
* The content of this file should be the relative path of those files, relative to the root of the drives listed below.
* Easiest way to create that files is to:
* $ sudo rsync -av /mnt/hdd5/* /mnt/hdd2/
* $ sudo rsync -acv --dry-run /mnt/hdd5/* /mnt/hdd2/ >> ${HOME}/files_to_check.txt
* $ sudo php ${HOME}/check_files.php ${HOME}/files_to_check.txt
*/
$bad_bad_drive = "/mnt/hdd5";
$savior_driver = "/mnt/hdd2";
if (isset($argv[1]) && file_exists($argv[1])) {
$input_file = $argv[1];
} else {
$input_file = "$HOME/files_to_check.txt";
if (!file_exists($input_file)) {
die("Usage: $0 [LIST_FILE]\n");
}
}
// Clear disk cache
exec("sync ; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'");
$files = explode("\n", @file_get_contents($input_file));
foreach ($files as $file) {
// Skip some lines from the input file:
if (trim($file) == '') continue; // empty line
if (trim($file) == 'sending incremental file list') continue; // rsync output
if (preg_match('/sent .*bytes/', $file)) continue; // rsync output
if (preg_match('/total size is .*speedup/', $file)) continue; // rsync output
if (preg_match('@/$@', $file)) continue; // end with a / character: folders
$original_file = "$bad_bad_drive/$file";
$copied_file = "$savior_driver/$file";
// Verify that the file exists on both drives
echo "$file: ";
if (!file_exists($original_file)) {
echo "File not found: $original_file";
continue;
}
if (!file_exists($copied_file)) {
echo "File not found: $copied_file\n";
continue;
}
// Calculate the MD5 checksum of both files, and compare them
$md5_file1 = md5_file($original_file);
$md5_file2 = md5_file($copied_file);
if ($md5_file1 == $md5_file2) {
echo "OK\n";
} else {
echo "\n WRONG checksum: $md5_file1 vs $md5_file2\n Will try to rebuild the data manually.\n";
$file_size = filesize($original_file);
echo " Original file size: " . number_format($file_size) . " bytes\n";
$fixed_file = "$copied_file.tmp";
$skip_file = FALSE;
$fin1 = fopen($original_file, 'r');
$fin2 = fopen($copied_file, 'r');
$fout = fopen($fixed_file, 'w+');
$i = 0;
$buffer_size = 64 * 1024;
while (!feof($fin1)) {
$bytes1 = fread($fin1, $buffer_size);
$bytes2 = fread($fin2, $buffer_size);
if ($bytes1 == $bytes2) {
fputs($fout, $bytes1);
$i += $buffer_size;
echo number_format($i) . " (".number_format($i/$file_size*100, 2)."%)\r";
} else {
// One or more of the bytes in $bytes1 & $bytes2 are incorrect.
// Let's find which one, and fix them.
for ($j=0; $j<strlen($bytes1); $j++) {
$byte1 = $bytes1[$j];
$byte2 = $bytes2[$j];
if ($byte1 == $byte2) {
$correct_byte = $byte1;
} else {
echo " Wrong byte at pos " . number_format($i) . ": 0x" . dechex(ord($byte1)) . " vs 0x" . dechex(ord($byte2)) . ".";
if (ord($byte1) > ord($byte2)) {
$correct_byte = $byte1;
$incorrect_byte = $byte2;
} else {
$correct_byte = $byte2;
$incorrect_byte = $byte1;
}
if (ord($correct_byte) - 16 != ord($incorrect_byte)) {
echo "\n Warning: difference is more than 16! Skipping this file.\n";
$skip_file = TRUE;
break;
}
echo " Writing 0x" . dechex(ord($correct_byte)) . "\n";
}
fputs($fout, $correct_byte);
echo number_format($i++) . " (".number_format($i/$file_size*100, 2)."%)\r";
}
if ($skip_file) {
break;
}
}
}
fclose($fin1);
fclose($fin2);
fclose($fout);
if ($skip_file) {
unlink($fixed_file);
continue;
}
rename($fixed_file, $copied_file);
exec("chown --reference=" . escapeshellarg($original_file) . " " . escapeshellarg($copied_file));
exec("chmod --reference=" . escapeshellarg($original_file) . " " . escapeshellarg($copied_file));
//echo " Final file size: " . number_format(filesize($copied_file)) . " bytes\n";
//echo " Checking rebuilt file checksum...\n";
//$md5 = md5_file($copied_file);
//echo " MD5 = $md5\n";
}
}
echo "Done.\n";
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment