Skip to content

Instantly share code, notes, and snippets.

Last active May 5, 2024 01:22
Show Gist options
  • Save rincebrain/e23b4a39aba3fadc04db18574d30dc73 to your computer and use it in GitHub Desktop.
Save rincebrain/e23b4a39aba3fadc04db18574d30dc73 to your computer and use it in GitHub Desktop.

So you want to understand what's going on with all these reports of "OpenZFS might be eating my data".

Here's a simple explanation of what is and isn't true.

A bunch of software uses "where are there holes in the file" as an optimization to avoid having to read large swathes of zeroes. Many filesystems implement a call like FIEMAP which tells you a list of "these regions of the file have data", in addition to or rather than SEEK_HOLE/DATA, which are extensions to lseek to tell you where the next hole/data is from a position in the file.

OpenZFS, back in 2013, implemented support for SEEK_HOLE/SEEK_DATA, and that went into release 0.6.2.

On ZFS, because it has things like compression which might decide that your large swathe of zeroes was actually a hole in the file and store it accordingly, it requires flushing out all the pending dirty stuff to disk for a file in order to know accurately if something has holes or not, and where.

ZFS implemented a check for "if this thing is dirty, force a flush" for using SEEK_HOLE/DATA on things.

Unfortunately, it turns out the "is this thing dirty" check was incomplete, so sometimes it could decide a thing wasn't dirty, when it was, and not force a flush, and give you old information about the file's contents if you hit a very narrow window between flushing one thing and not another.

If you actually tried reading it, that would be correct, but if you skipped reading parts of it at all because you thought they were empty, well, then your program has incorrect ideas about what the contents was, and might write those ideas, or modify the empty areas and write the result, out somewhere.

So if you, say, reproduced this with cp on a file, you might get regions of zeroes where none existed on the destination, no matter what filesystem, if the source was ZFS and you hit this bug.

But if you were, say, compiling something, and the linker got zeroes for some of the objects, the output might not be just zeroes where it found something wrong, because it's not just copying the things it reads in, it's doing things to them and outputting a result.

So you can't just say "if I have regions of zeros, this is mangled" because plenty of files legitimately have large regions of zeroes. You also can't just say "if I have no regions of zeroes, this isn't mangled", because again, programs often read files and do things with the contents, not just write them out.

This isn't just a problem with using cp, or any particular program.

The good news is, this is very hard to hit without contrived examples, which is why it was around for so long - we hit a number of cases where someone made the window where it was incorrect much wider, those got noticed very fast and undone, and we couldn't reproduce it with those contrived examples afterward.

It's also not very common because GNU coreutils just started using things like this by default with 9.0+, though that's just for the trivial case of using cp, things that read files outside of coreutils might be doing god knows what.

So your best bet is if you have a known good source copy or hash from one, to compare your stuff against that. Anything else is going to be a heuristic with false positives and negatives.

Yes, that sucks. But life is messy sometimes.

Copy link

send/recv doesn't care, it only operates on things where everything is flushed out.

I can't immediately think of a way you could make a zvol break with this unless you were like, trying to cp the raw contents of the device, but I'll reply if I come up with something else.

Copy link

ollien commented Dec 2, 2023

@rincebrain thanks for putting this post together and for your response to the community. If you don't mind clarifying your above comment to a layuser, does this mean that a resilver would similarly be unaffected? The only way this could happen is with actions that would write individual blocks?

Copy link

scrub/resilver don't care either.

Copy link

"the conceptual error that led to the problem appears to have been introduced in 2006."

Admittedly, I clicked on the link for this, but I have no idea what I am looking at nor what I am looking for.

"That doesn't mean this version is affected, as the bug comes from a combination of events occuring at just the right moments, and those other things may not be present. So I wouldn't say it has "no data nor evidence", but rather, circumstantial evidence that means it can't be ruled out."
Agreed - but this was again, a part of my motivation to test ZFS version 32 in Solaris 10 1/13 install, to hammer it in ad simile to the way that (Open)ZFS is being is being hammered on Illumos/*BSD/Linux distros to test for the prevalence of the issue, outside of the context of OpenZFS.

(i.e. if it was a part of the "original" ZFS code, then it had always been lurking around -- but that it never surfaced.)

But I ran into issues running in said Solaris 10 1/13 install because some of the commands that are in the are still rather Linux specific which Solaris doesn't recognise.

"The transaction commit stage still happens as normal, and that's the place this bug lives. Again, it will change the timing, but it can't be predicted. Thus, its not an effective workaround."
I appreciate your insight in regards to this. Thank you.

"Only direct testing or asking Oracle can answer that."
Yeah, I tried that - no cigar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment