Created
August 30, 2011 03:19
-
-
Save soh-cah-toa/1180094 to your computer and use it in GitHub Desktop.
Introduction to PODDS
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A debug data format provides a way for storing high-level source information about a program. This | |
allows analysis software such as debuggers and profilers to form a relationship between the | |
executable code and the original source code that generated it. This is the purpose of PODDS: | |
Parrot Opcode Debug Data Serialization format. | |
The full PODDS specification is quite lengthy. Therefore, this document is meant to serve as a | |
quick introduction to PODDS. If you'd like to see the full specification, visit https:/ | |
gist.github.com/1133182. | |
The most basic entity in PODDS is called a "Data Description Entity" or DDE. A DDE consists of a | |
"class" that indicates what it describes and a list of "properties" that further describe the | |
specific characteristics of the entity. Excluding the topmost DDE, a DDE will always be owned by | |
a parent DDE and may or may not have any child or sibling DDE's. | |
Examples of class names: | |
CLASS_array_type | |
CLASS_class_type | |
CLASS_global_var | |
CLASS_lex_block | |
CLASS_local_var | |
CLASS_param | |
CLASS_src_file | |
CLASS_sub | |
Properties always form a name/value pair. A value will always have one of the following forms: | |
* address - points to some location in the program's address space | |
* reference - refers to another DDE in the debug segment | |
* constant - uninterpreted numerical data | |
* block - uninterpreted data | |
* string - a null-terminated series of zero or more bytes | |
Examples of property names: | |
PT_end_pc | |
PT_lang | |
PT_location | |
PT_program | |
PT_sibling | |
PT_start_pc | |
PT_start_scope | |
There is no restriction on the order in which properties appear. To eliminate ambiguity, each | |
property is unique and no more than one property of a given name may appear in a DDE. | |
The ownership of DDE entries is represented by their physical ordering and use of the | |
`PT_sibling` property. The value of this property is a reference to another DDE. If the DDE | |
referred to is null, it represents the end of the sibling chain. Except for `CLASS_padding`, all | |
DDE's are required to have the `PT_sibling` property. A DDE is owned by its physical predecessor | |
(called the "parent") unless it is referenced by that physical predecessor with the `PT_sibling` | |
property. You can think of this DDE as the first child of the predecessor. Children derived from | |
a DDE form a chain of siblings. | |
A symbolic debugger has to access PODDS data very frequently. Therefore, it is very important to | |
consider how to decrease the amount of time needed to read and interpret debug data. This | |
becomes quite difficult when a program object is defined outside the compilation unit where the | |
debugee is stopped. To find the DDE associated with a program object, a debugger would have to | |
run a very aggressive search through every DDE at the highest scope in each compilation unit. | |
This can severally cripple the performance of the debugger. | |
To combat this problem, a compiler has the option of providing two separate types of tables that | |
provide information about the DDE's owned by a particular compilation unit: the public name table | |
and the public address table. | |
The "public name table" is a subsection of the debug segment consisting of records that contain | |
variable-length entries. Each record describes the names of program objects described by the | |
DDE's that are owned by a single compilation unit. Each record starts with a header that contains | |
three important values: 1) the (non-inclusive) length of the entries for that record, 2) the | |
offset of the compilation unit's DDE from the start of the debug segment, and 3) the size in | |
bytes of the DDE describing that particular compilation unit. Following the header is a variable | |
number of offset/name pairs. Each pair contains the offset from the start of the compilation unit | |
entry that corresponds with the current record for the DDE for the given program object, followed | |
by a string representing the object's name as found in its `PT_name` property. Each record is | |
terminated by a null pair. In this way, a debugger can rapidly determine which compilation unit | |
to search in order to find the DDE for a program object with a given name. | |
The "public address table" is a subsection of the debug segment consisting of records that | |
contain variable-length entries. Each record describes the section of the program's address | |
space that contains the compilation unit. Each record starts with a header that contains two | |
important values: the (non-inclusive) length of the entries for that record and the offset of | |
the compilation unit's DDE from the start of the debug segment. Following the header is a | |
variable number of pairs of "address range descriptors." Each one contains the starting address | |
of the range followed by its length. Each record is terminated by a null pair. In this way, a | |
debugger can rapidly determine which compilation unit to search in order to find the DDE for a | |
program object with a given address. | |
Associating source-level lines numbers with their respective generated opcodes makes it possible | |
for a debugger user to specify addresses in relation to source statements. This makes single | |
stepping much more easier. | |
Each compilation unit DDE in the debug segment references a corresponding record in the line | |
number table that describes its respective source statement. The first record in the table | |
includes the length of the table in bytes and is followed by the address of the first opcode | |
generated for the compilation unit. The rest of the table consists of a list of source statement | |
records. A source statement record consists of three parts: 1) a line number, 2) a position | |
within the source line, and 3) an opcode address. The line numbers are ordered starting with 1 | |
from the beginning of the compilation unit. | |
The compiler has two ways to represent the position within the source line. It can either use the | |
number of characters from the beginning of the line to the beginning of the source statement or | |
use the special value `SRC_NO_POS` to indicate that the record refers to the entire line. This | |
feature is necessary for HLL's that allow multiple statements in a single line. | |
The address in each record describes the address of the first opcode generated for that source | |
statement minus the address of the first opcode generated for the compilation unit. That is, it | |
represents the offset into the compilation unit. | |
Some HLL's allow statements to extend over multiple lines. The record in such a case will refer | |
to the line containing the start of that particular statement. | |
There is no limitation on the order in which the records appear. They do not necessarily represent | |
the exact order in which the statements appear in the original source file. Additionally, it is | |
not required to have a record in the line number table for every single source statement in the | |
original source file. | |
To terminate the line number table, PODDS uses a record whose line number is 0 and whose address | |
describes the first opcode of the next compilation unit. This allows the debugger to understand | |
which opcodes are associated with the last statement in a compilation unit; a useful feature for | |
stepping out of functions. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment