Skip to content

Instantly share code, notes, and snippets.

@knghtbrd
Created July 12, 2017 13:10
Show Gist options
  • Save knghtbrd/219e005476695a41ebb1a291e37b38a9 to your computer and use it in GitHub Desktop.
Save knghtbrd/219e005476695a41ebb1a291e37b38a9 to your computer and use it in GitHub Desktop.
Why Apple II disk images suck even though they really didn't have to...

Why Apple II disk images suck

The frustrating thing about Apple II disk images ... are the disk images themselves. In theory they work, and in practice we can kinda make them work, but overall ... no, it's just a mess.

The hardware format

Okay, let's start with the basics: Most disk images for the Apple II are 143,360 bytes on the nose. And when we encounter a file of that size we expect to be a disk image, we're looking at a track/sector dump of an Apple II 140k floppy disk. So much for the physical format, right?

Well no.

The file might be in DOS order or ProDOS order. DOS order files are the raw interleaved track/sector order from DOS 3.3. ProDOS files are a different interleave of track and sector, but the upshot is that they are effectively sequential blocks. Block 0 is followed by block 1, etc.

The problem is that while we can examine the data and take a wild guess, we can't know for sure which one we're looking at. And it might look like valid data in either DOS or ProDOS ordering, especially in the case of non-ProDOS. And what if it isn't standard ProDOS or DOS 3.3? There are alternatives.

Even "better", copy protected disks may be misdetected.

So just use .do for things you know are in DOS order and .po for things in ProDOS order, right? Yeah, I've seen .po files in DOS order. And .do files in ProDOS order. I've seen .dsk files that are both or neither. Yes, neither DOS nor ProDOS order disks of the described format, but which are still Apple II disk images named with .dsk.

I've also seen short disk images. These might be 13 sector disks (which will always be DOS order, probably, but they MIGHT be truncated disks! The truncated disks might even have every file complete and intact. Who the hell thought that was a good idea?

Okay, so then what about 2img? Isn't that what it's supposed to do? Well yes, except the header sometimes lies, even about the sector ordering. I've even seen 2img files with the proper extension and no header. Whyyyyy?!

And there's a handful of other extensions that get used like hdv which are often just .po images, but which source code suggests may be something else in some contexts. Great.

We haven't even talked about Nibble images, DOS 3.3 on 3.5" disks (they're .po, as all > 140k formats pretty universally are, but contain two DOS 3.3 volumes), Mac hard drives containing ProDOS partitions, raw hard drives containing multiple partitions with or without a partition table of any sort, etc. At least things larger than 140k aren't accessed by track/sector. Well except things like UniDOS actually are. but their physical format is ProDOS order!

Some of these things have headers. Often the headers are useful. Unless they're not. And we definitely haven't talked about filesystem.

The operating system

Apple II disks generally have one of only a few filesystems on them. This is good because we have to guess the format, as you saw from the last section. ProDOS isn't too hard if the format is properly ProDOS, there are signatures we can look for. But DOS 3.3? Good luck. And anything that isn't DOS 3.3, but is kind of based on it with some hacks? Now you're asking for it.

And that leaves out Pascal, CPM, and a host of other things that might be on that disk. For the Apple IIgs, HFS is even a possibility.

And then you come to the file names. While ProDOS and Pascal will have filenames that don't match Windows' requirements, they're at least sane and we can work with them. Again, so long as the disk wasn't created with any shenanigans in mind. But in DOS 3.x? Ever seen a file name made entirely of control characters? Yup, that's a thing.

So is filesystem as file. Yeah, not just on the modern machine either. ProDOS can and does store whole disk iamges on larger drives.

Nibble formats

There's more than one. A lot of emulators and the like treat nibble images as read-only because the format is complex and computationally expensive special case stuff. It makes some copy-protected disks copyable, and it allows you to image disks that have little hacks added to them. For example, it's possible to take a 13 sector disk and modify it so that it contains something extra. Illegal on 13 sector hardware and ignored, but it allows 16 sector hardware to boot the disk without chainloading. No WAY you can represent that kind of hack in a track/sector dump, because you have illegal sectors that just happen to work. Amd good luck figuring that one out!

Compression

Add to all of this that people want to have files be compressed often. Sometimes they use NuFX format (ShrinkIt) which is alien to the world outside the Apple II, although we have tools and libraries for that which need only some tweaking to build properly on Linux (libnufx builds as a static library and we need to enable -fPIC so we can build it shared).

But people also want to compress the disk with other tools like gzip.. Or zip. Zip only adds frustration to the mix because it's not actually a file compression algorithm like gzip, it's a packaging format with compression as an option. You tell me to open a disk image inside a zip file. Which one?

AppleCommander possibly wisely chose not to support anything but NuFX. CiderPress ... decided to try and build the house to put the kitchen sink into.

It's going to get worse not better

Already we have at best half-assed support for 2img in many things, and now we have EDD which is currently supported by nothing, and AppleSauce on the horizon that everyone's already looking to have support for, and that's going to mean another disk image format and an open ended ... thing ... that's supposed to describe ... who knows what.

Conclusion

Sadly, every time someone seems to suggest the one new universal format to solve all the woes we have with all the existing ones, it doesn't wind up seeing much widespread support. I think possibly the only way to truly resolve the issue would be to go in and clean up sites like asimov--determine the actual format of disk images, remove duplicates, and name and describe them according to prescribed formats determined in advance.

That's not likely to happen. And it's probably not even likely to be seen as desirable if only from a bandwidth point of view as everyone redownloads their mirrors. I don't have access to do it certainly, and I don't want the hate and rage likely to come from trying to do such a thing. In short, this is a hard problem without an easy solution.

But even though the problem is hard, it is not insurmountable. Just ... please understerstad, any particular person reading this with a notion of organizing an effort to change it, that there is a reason we have ended up with this mess, and it won't be easily changed. If you'd do me a favor, strive at least to not allow whatever it is you're doing to make it worse. :)

@retrocomputist
Copy link

We actually nibblise all disks loaded into the Octalyzer and work with them as NIBs. When you write information to the disk, it's saved as a NIB. Identifying a rogue NIB can be tricky but there's certain track 0 information that's common to particular formats, as well as catalogue information and other bits that can be looked at to figure out what the format is. But I agree it is a trial-and-error process and not always perfect.

@retrocomputist
Copy link

retrocomputist commented Jul 12, 2017

We are, as a matter of course, currently standardising the Octalyzer to use .ZIPped .NIB files as our common format. Also, in the modern era data is cheap so fingerprinting a common library of disks and using that to identify nibblized files is not out of the question.

@knghtbrd
Copy link
Author

knghtbrd commented Jul 13, 2017

There are few enough formats in common usage that fingerprinting each and determining how best to store it is not a bad implementation. You've got DOS 3.x (3.2 is only a slight variant of 3.3), ProDOS, Pascal, and CPM. Those are the standard ones. You then have a couple variants of a 400k format that get used for a couple 3.5 or ProDOS based DOS 3.x mods, and variants of DOS 3.3 and 3.2 such as RDOS and ProntoDOS.

Even the hacks (DOS and RDOS 3.2 disks patched at the disk level to boot on 16 sector hardware) are predictable if you know they exist.

None of this is even hard really, or wouldn't be if we actually had used some form of header or metadata format on our files, were consistent about names and extensions, the headers were actually used on 2mg files properly everywhere they claim to be supported, etc.

This isn't a problem because the code sucks or the disk formats are impossible to decipher. The problem is that we have been lazy and inconsistent, which has turned disk processing into a soft of DWIM sort of problem. I'm half tempted to, if you hand me a .do file containing DOS 3.3 in ProDOS order, abort with an error and say the disk appears invalid. Not because I can't read it. I had to read it to be able to figure out exactly how your disk was broken, actually. No, I ought to spit it back out to you as broken because it is and I shouldn't be encouraging that kind of behavior.

Of course what I'll actually do is write a comment in my code about how the extension lied about what's in the file, swap the sectors, and carry on because everybody else does. But I'm not going to be happy about it! ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment