Skip to content

Instantly share code, notes, and snippets.

@dbrant
Created April 20, 2022 13:50
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dbrant/cf17199c0224927a1b04caf805ad0f3a to your computer and use it in GitHub Desktop.
Save dbrant/cf17199c0224927a1b04caf805ad0f3a to your computer and use it in GitHub Desktop.
Tape backup format for Mac Retrospect v??
(Dmitry Brant, Apr 2022)
Recently I came across a backup tape (an AIT-3 100GB tape) that was written with a format
I didn't recognize. The only thing I knew is that it came from a Mac workstation, which means
it was likely written using Retrospect, which was a popular backup tool at the time.
This is the result of my reverse-engineering effort to get the contents out of this archive.
------------------
This backup format is composed of a sequence of blocks which use FourCC-style formatting.
(All data is big-endian. Dates are formatted as seconds since Jan 1 1904.)
Each block has the following format:
offset | length |
----------------------------------
0 | 4 | Block name
4 | 4 | Block length (including the name and length fields)
8 | n | Block data
If the goal is simply to recover the file contents from the backup, we only need to care
about these block types: "Diry", "File", "Fork", and "Cont".
A "Diry" block indicates the directory in which any subsequent files are stored.
A "File" block indicates the start of a new file, and contains metadata about the file (name,
date, etc). After this block, there may be any number of "Fork" and "Cont" blocks that contain
the actual contents of the current file. In other words, the file contents consist of numerous
"Fork" blocks, and each fork block can have numerous continuation "Cont" blocks. And so, the
overall structure looks like this:
"Diry" <-- current directory
"File" <-- first file
"Fork"
"Cont"
"Cont"
...
"Fork"
"Cont"
"Cont"
...
...
"File" <-- second file
"Fork"
"Cont"
...
Here is the structure of a "Diry" block:
offset | length |
----------------------------------
0 | 4 | "Diry"
4 | 4 | Block length
8 | 4 | Access date
16 | 4 | Create date
1A | 4 | Modify date
50 | n | Directory name, until end of block
Here is the structure of a "File" block:
offset | length |
----------------------------------
0 | 4 | "File"
4 | 4 | Block length
8 | 4 | Access date
16 | 4 | Create date
1A | 4 | Modify date
1E | 8 | File size
46 | n | File name, until end of block
Here is the structure of a "Fork" block:
offset | length |
----------------------------------
0 | 4 | "Fork"
4 | 4 | Block length
8 | 16 | Some kind of fork-specific header, seems to be ignorable.
1E | n | File data, until end of block.
Here is the structure of a "Cont" block:
offset | length |
----------------------------------
0 | 4 | "Cont"
4 | 4 | Block length
8 | n | File data, until end of block.
NOTE: Once in a while a block will be realigned onto a 0x200-byte boundary. If you're expecting
to read a new block and instead you get a null block name, try aligning to the next 0x200 byte
boundary.
NOTE 2: The tape backup is "segmented" into segments of 512MiB each, which are written as separate
file records on the tape. The very first segment starts with a 0x2000 byte header which does NOT
conform to the "block" format. After skipping over this header, the block structure starts.
There are several other types of blocks such as "Priv", "NodX", and "Sgmt" (which contains metadata
about the backup volume itself), but these seem to be safely ignorable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment