Skip to content

Instantly share code, notes, and snippets.

@rikkimax
Last active April 4, 2024 08:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rikkimax/d75bdd1cb9eb9aa7bacb532b69ed398c to your computer and use it in GitHub Desktop.
Save rikkimax/d75bdd1cb9eb9aa7bacb532b69ed398c to your computer and use it in GitHub Desktop.

Source Archive Format

The file format of the Source Archive Format file is very similar to that of object file libraries and various other schemes. It does not adhere to those other schemes due to their variances from platform to platform, all the code needed to support things that are unneeded for Source Archive Format files, and special consideration for D's needs. The format is meant to be friendly for memory-mapped file access, and does not have alignment issues.

The file extension is sar.

Structure

The file is broken up into sequential blocks, the start of each block will be padded to alignment of 16 bytes to enable aligned SIMD access to that block's contents.

All integers are in little endian format and are unsigned.

Except the header, all blocks have a four byte block id, followed by a eight byte length field.

Header

A required header that denotes version of the source file as well as verification that it is a Source Archive Format.

The block structure is as follows:

offset length Name Value
0 4 Block ID In hex: 4D 73 61 72, in ASCII: Msar
4 4 Zero terminator 00 00 00 00
8 8 Length Of Header 16

Future versions may have added fields that are denoted from an increased length of header value.

Source Entries

A source entries block works as a table of contents upon all the different source files embedded. It includes support for applying a CLI argument string if any of these files are used.

There may be multiple of these blocks in a Source Archive Format file. Enabling blind appending by tooling (although checking for existing values and zeroing out its file name and auxillary file name would be a good idea).

offset length Name Value
0 4 Block ID In hex: 45 6E 54 53, in ASCII: ENTS
4 8 Size of block
12 4 Number of entries
16 4 Length of CLI arguments that are to be applied for all source files
20 last value CLI argument string, may have values separated by wrapping with double quotes

Following this is the a variable length array that is composed of values:

offset length Name Value
0 4 Filename length
4 4 Auxillary name length The D module name
8 4 CLI argument string length Must be zero terminated
12 8 File contents length Does not include zero termination in length
18 8 File contents offset after this block
26 File name length File name Must be zero terminated
26 + file name length auxillary name length Auxillary name Must be zero terminated
26 + file name length + auxillary name length CLI argument string length CLI argument string if this file is used Must be zero terminated

Following this is the file contents at the offset specified by its entry and of a given length.

All file contents must be aligned to 16 bytes and end with 16 0 values for enable faster lexing. Padding for the next entry to align it, may contribute to the zeros at the end of the file contents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment