A highly compressible, appendable, indexed, fast-parsing, flexible, extensible, human-debuggable, machine-verifiable, tamper-resistant archive format.
JSONar takes some of the best parts of tar
, without also being a
forensic history of computing.
There are two types of records in JSONar: entries and indexes.
Subsequent entries with the same path value as previous entries override those previous entries.
The structure of an entry is:
intro
- The ascii string">JSONar\n"
pathLen
- 4 bytes - size of path name as an unsigned 32-bit big-endian intheaderLen
- 4 bytes - size of header portion as UInt32BEbodyLen
- 4 bytes - size of body as UInt32BE\n
(1 byte, value 0x0A)path
- path name as UTF-8 encoded string of byte length defined in the first 4 bytes uint\n
(1 byte, value 0x0A)header
- header as JSON string of length defined in second 4 byte uint\n
(1 byte, value 0x0A)body
- body bytes of length defined in third 4 byte uint\n
(1 byte, value 0x0A)shasum
- a 64-byte sha512sum of the previous sections of the entry.\n
(1 byte, value 0x0A)
All 13 parts are always present, but body and header can be 0 bytes.
The JSON header information should contain the following values, but any arbitrary data is allowed.
-
type
- One of the following strings, indicating the type of file that the entry represents. The default isfile
.- file
- directory - Entry body MAY contain directory listing
- fifo
- symboliclink - Entry body contains link target
- link - Entry body contains link target
- characterdevice
- socket
- tombstone - Explicitly removed from archive.
- index - A JSONar index (see below)
-
dev
- The device id of the file system entry -
ino
- The inode value of the file system entry -
mode
- The numeric mode (including suid and sticky bit) -
nlink
-
uid
gid
- User and group IDs of file owner -
rdev
(optional for non-device files) -
atime
- access time (optional) -
ctime
- change time -
mtime
- modification time -
birthtime
- file creation time (optional)
To skip over a record:
- Read the first 20 bytes. Assert that the first 8 bytes are the
string are the ascii string
">JSONar\n"
. Interpret the next 12 bytes as 3 unsignted int32 values. - Add those three numbers, plus 5 for the
\n
delimiters, plus 64 for the sha512sum. - Skip ahead that many bytes.
To read a record securely from start to finish:
- Start a SHA-2 512 checksum stream.
- Read the first 8 bytes. Assert that they are the string
">JSONar\n"
. - Read the next 12 bytes. Interpret this as 3 unsignted int32
values. Assign these to
pathLen
,headerLen
, andbodyLen
, respectively. - Write 20 consumed bytes to checksum stream.
- Read next byte. Assert it is
'\n'
. Write'\n'
to checksum stream. - Read
pathLen
bytes. Interpret as utf-8 string. This is the entry path. Write bytes to checksum stream. - If the
path
is@JSONar Index
, then skip to the next entry. (Indexes are only relevant in random access mode.) - Read next byte. Assert it is
'\n'
. Write'\n'
to checksum stream. - Read
headerLen
bytes. Interpret as utf-8 string. This is the header JSON. Write bytes to checksum stream. - Decode header JSON. This is the
metadata
. (If it does not parse as valid JSON, then skip to end of record, or abort entirely.) - Consume next byte. Assert it is
'\n'
. Write'\n'
to checksum stream. - Consume
bodyLen
bytes. This is the entry body. Write each byte to checksum stream. - Consume next byte. Assert it is
'\n'
. Write'\n'
to checksum stream. - Read next 64 bytes. This is expected checksum digest. End checksum stream. Verify actual digest matches expected digest.
- Consume next byte. Assert it is
'\n'
.
An index is a map from path names to positions within the file where the entry can be found.
An index is a special kind of entry where:
- The
path
field is@JSONar Index
- The
bodyLen
value is always8
. - The
header.type
field is"index"
- The
header.entries
is an object which maps filenames to file offsets where the most recent entry for that pathname can be found. - The
body
is 8 bytes indicating the file offset of the index as an unsigned 64-bit big-endian integer.
When writing a JSONar, an index should be written after entries are added.
When reading a JSONar file from disk, it is possible to seek throughout the file to access items randomly using the index.
To access files randomly in a JSONar file,
- Read the last 70 bytes of the file. The first 4 bytes are the file
offset of the index. The 5th byte is
'\n'
, then 64 byte sha512 checksum, then'\n'
. If the delimiters aren't in the right places, give up. - Seek back to the position indicated in the index body, and read to end of the record.
- Check the index checksum, verify that the path name is
@JSONar Index
, and pull out theheader.entries
object. - At this point, file metadata and contents can be accessed by seeking to the appropriate point in the file and reading the entry.
This is a bad idea and you should probably not implement this, except for fun.
As @tef points out in the comments below, negative file offsets relative to the start of the index (and the index body having a negative offset relative to the end of the index) is a better idea, because it means that archives can be concatenated, or garbage prepended to the start (thus supporting self-extraction).
To the extent that this is "software", you may use it under the following license:
The ISC License
Copyright (c) Isaac Z. Schlueter and Contributors
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR
IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
I was just thinking "make the hash replaceable" is a smart thing.