Skip to content

Instantly share code, notes, and snippets.

@soh-cah-toa
Created August 30, 2011 03:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save soh-cah-toa/1180094 to your computer and use it in GitHub Desktop.
Save soh-cah-toa/1180094 to your computer and use it in GitHub Desktop.
Introduction to PODDS
A debug data format provides a way for storing high-level source information about a program. This
allows analysis software such as debuggers and profilers to form a relationship between the
executable code and the original source code that generated it. This is the purpose of PODDS:
Parrot Opcode Debug Data Serialization format.
The full PODDS specification is quite lengthy. Therefore, this document is meant to serve as a
quick introduction to PODDS. If you'd like to see the full specification, visit https:/
gist.github.com/1133182.
The most basic entity in PODDS is called a "Data Description Entity" or DDE. A DDE consists of a
"class" that indicates what it describes and a list of "properties" that further describe the
specific characteristics of the entity. Excluding the topmost DDE, a DDE will always be owned by
a parent DDE and may or may not have any child or sibling DDE's.
Examples of class names:
CLASS_array_type
CLASS_class_type
CLASS_global_var
CLASS_lex_block
CLASS_local_var
CLASS_param
CLASS_src_file
CLASS_sub
Properties always form a name/value pair. A value will always have one of the following forms:
* address - points to some location in the program's address space
* reference - refers to another DDE in the debug segment
* constant - uninterpreted numerical data
* block - uninterpreted data
* string - a null-terminated series of zero or more bytes
Examples of property names:
PT_end_pc
PT_lang
PT_location
PT_program
PT_sibling
PT_start_pc
PT_start_scope
There is no restriction on the order in which properties appear. To eliminate ambiguity, each
property is unique and no more than one property of a given name may appear in a DDE.
The ownership of DDE entries is represented by their physical ordering and use of the
`PT_sibling` property. The value of this property is a reference to another DDE. If the DDE
referred to is null, it represents the end of the sibling chain. Except for `CLASS_padding`, all
DDE's are required to have the `PT_sibling` property. A DDE is owned by its physical predecessor
(called the "parent") unless it is referenced by that physical predecessor with the `PT_sibling`
property. You can think of this DDE as the first child of the predecessor. Children derived from
a DDE form a chain of siblings.
A symbolic debugger has to access PODDS data very frequently. Therefore, it is very important to
consider how to decrease the amount of time needed to read and interpret debug data. This
becomes quite difficult when a program object is defined outside the compilation unit where the
debugee is stopped. To find the DDE associated with a program object, a debugger would have to
run a very aggressive search through every DDE at the highest scope in each compilation unit.
This can severally cripple the performance of the debugger.
To combat this problem, a compiler has the option of providing two separate types of tables that
provide information about the DDE's owned by a particular compilation unit: the public name table
and the public address table.
The "public name table" is a subsection of the debug segment consisting of records that contain
variable-length entries. Each record describes the names of program objects described by the
DDE's that are owned by a single compilation unit. Each record starts with a header that contains
three important values: 1) the (non-inclusive) length of the entries for that record, 2) the
offset of the compilation unit's DDE from the start of the debug segment, and 3) the size in
bytes of the DDE describing that particular compilation unit. Following the header is a variable
number of offset/name pairs. Each pair contains the offset from the start of the compilation unit
entry that corresponds with the current record for the DDE for the given program object, followed
by a string representing the object's name as found in its `PT_name` property. Each record is
terminated by a null pair. In this way, a debugger can rapidly determine which compilation unit
to search in order to find the DDE for a program object with a given name.
The "public address table" is a subsection of the debug segment consisting of records that
contain variable-length entries. Each record describes the section of the program's address
space that contains the compilation unit. Each record starts with a header that contains two
important values: the (non-inclusive) length of the entries for that record and the offset of
the compilation unit's DDE from the start of the debug segment. Following the header is a
variable number of pairs of "address range descriptors." Each one contains the starting address
of the range followed by its length. Each record is terminated by a null pair. In this way, a
debugger can rapidly determine which compilation unit to search in order to find the DDE for a
program object with a given address.
Associating source-level lines numbers with their respective generated opcodes makes it possible
for a debugger user to specify addresses in relation to source statements. This makes single
stepping much more easier.
Each compilation unit DDE in the debug segment references a corresponding record in the line
number table that describes its respective source statement. The first record in the table
includes the length of the table in bytes and is followed by the address of the first opcode
generated for the compilation unit. The rest of the table consists of a list of source statement
records. A source statement record consists of three parts: 1) a line number, 2) a position
within the source line, and 3) an opcode address. The line numbers are ordered starting with 1
from the beginning of the compilation unit.
The compiler has two ways to represent the position within the source line. It can either use the
number of characters from the beginning of the line to the beginning of the source statement or
use the special value `SRC_NO_POS` to indicate that the record refers to the entire line. This
feature is necessary for HLL's that allow multiple statements in a single line.
The address in each record describes the address of the first opcode generated for that source
statement minus the address of the first opcode generated for the compilation unit. That is, it
represents the offset into the compilation unit.
Some HLL's allow statements to extend over multiple lines. The record in such a case will refer
to the line containing the start of that particular statement.
There is no limitation on the order in which the records appear. They do not necessarily represent
the exact order in which the statements appear in the original source file. Additionally, it is
not required to have a record in the line number table for every single source statement in the
original source file.
To terminate the line number table, PODDS uses a record whose line number is 0 and whose address
describes the first opcode of the next compilation unit. This allows the debugger to understand
which opcodes are associated with the last statement in a compilation unit; a useful feature for
stepping out of functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment