Skip to content

Instantly share code, notes, and snippets.

@robhaswell
Created July 26, 2012 10:57
Show Gist options
  • Save robhaswell/3181495 to your computer and use it in GitHub Desktop.
Save robhaswell/3181495 to your computer and use it in GitHub Desktop.
ZFS recovery challenge
This problem concerns the TPJ node "130" (178.33.229.130), on Friday it suffered a failure with out-of-date replication which was then stashed. The concern at the moment is filesystem mail-1636, there are emails in which you can see in the AP-SMTP log that do not exist on the filesystem.
There is an hcl command which should recover these files:
hcl promote_stash -f mail-1636 -s 1342800864628
This attempts to mount a clone of the filesystem on /hcfs-tmp, and recv the stashed snapshot from this file into it:
/hcfs-stash/mail-1636/1342800864628-1342788663011_178.33.229.130_1342786125589_178.33.229.130_0-1342800590083_178.33.229.130_1342788663011_178.33.229.130_0/data
Afterwards it does some rsyncing and cleanup. Luke had to make some recent changes to hcl to make this work because the head had been trashed, or something - this part is a little hazy to me. Anyway the command fails with:
Found it! We can proceed...
Copying filesystem up to 1342788663011_178.33.229.130_0 into /hcfs-tmp/hpool/hcfs/
Copy created, now catting data file into hcfs-tmp filesystem...
cannot receive incremental stream: destination hpool/hcfs-tmp/hpool/hcfs/mail-1636 has been modified since most recent snapshot
So your challenge is! Somehow recv that data file into something so we can recover its contents. It would also be great if you could modify hcl to do it automatically in future because we think there are a number of other filesystems with missing data in them.
Many thanks :-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment