soh-cah-toa/gist:1180094

## gistfile1.txt
A debug data format provides a way for storing high-level source information about a program. This
allows analysis software such as debuggers and profilers to form a relationship between the
executable code and the original source code that generated it. This is the purpose of PODDS:
Parrot Opcode Debug Data Serialization format.

The full PODDS specification is quite lengthy. Therefore, this document is meant to serve as a
quick introduction to PODDS. If you'd like to see the full specification, visit https:/
gist.github.com/1133182.

The most basic entity in PODDS is called a "Data Description Entity" or DDE. A DDE consists of a
"class" that indicates what it describes and a list of "properties" that further describe the
specific characteristics of the entity. Excluding the topmost DDE, a DDE will always be owned by
a parent DDE and may or may not have any child or sibling DDE's.

Examples of class names:

CLASS_array_type
CLASS_class_type
CLASS_global_var
CLASS_lex_block
CLASS_local_var
CLASS_param
CLASS_src_file
CLASS_sub

Properties always form a name/value pair. A value will always have one of the following forms:

* address   - points to some location in the program's address space
* reference - refers to another DDE in the debug segment
* constant  - uninterpreted numerical data
* block     - uninterpreted data
* string    - a null-terminated series of zero or more bytes

Examples of property names:

PT_end_pc
PT_lang
PT_location
PT_program
PT_sibling
PT_start_pc
PT_start_scope

There is no restriction on the order in which properties appear. To eliminate ambiguity, each
property is unique and no more than one property of a given name may appear in a DDE.

The ownership of DDE entries is represented by their physical ordering and use of the
`PT_sibling` property. The value of this property is a reference to another DDE. If the DDE
referred to is null, it represents the end of the sibling chain. Except for `CLASS_padding`, all
DDE's are required to have the `PT_sibling` property. A DDE is owned by its physical predecessor
(called the "parent") unless it is referenced by that physical predecessor with the `PT_sibling`
property. You can think of this DDE as the first child of the predecessor. Children derived from
a DDE form a chain of siblings.

A symbolic debugger has to access PODDS data very frequently. Therefore, it is very important to
consider how to decrease the amount of time needed to read and interpret debug data. This
becomes quite difficult when a program object is defined outside the compilation unit where the
debugee is stopped. To find the DDE associated with a program object, a debugger would have to
run a very aggressive search through every DDE at the highest scope in each compilation unit.
This can severally cripple the performance of the debugger.

To combat this problem, a compiler has the option of providing two separate types of tables that
provide information about the DDE's owned by a particular compilation unit: the public name table
and the public address table.

The "public name table" is a subsection of the debug segment consisting of records that contain
variable-length entries. Each record describes the names of program objects described by the
DDE's that are owned by a single compilation unit. Each record starts with a header that contains
three important values: 1) the (non-inclusive) length of the entries for that record, 2) the
offset of the compilation unit's DDE from the start of the debug segment, and 3) the size in
bytes of the DDE describing that particular compilation unit. Following the header is a variable
number of offset/name pairs. Each pair contains the offset from the start of the compilation unit
entry that corresponds with the current record for the DDE for the given program object, followed
by a string representing the object's name as found in its `PT_name` property. Each record is
terminated by a null pair. In this way, a debugger can rapidly determine which compilation unit
to search in order to find the DDE for a program object with a given name.

The "public address table" is a subsection of the debug segment consisting of records that
contain variable-length entries. Each record describes the section of the program's address
space that contains the compilation unit. Each record starts with a header that contains two
important values: the (non-inclusive) length of the entries for that record and the offset of
the compilation unit's DDE from the start of the debug segment. Following the header is a
variable number of pairs of "address range descriptors." Each one contains the starting address
of the range followed by its length. Each record is terminated by a null pair. In this way, a
debugger can rapidly determine which compilation unit to search in order to find the DDE for a
program object with a given address.

Associating source-level lines numbers with their respective generated opcodes makes it possible
for a debugger user to specify addresses in relation to source statements. This makes single
stepping much more easier.

Each compilation unit DDE in the debug segment references a corresponding record in the line
number table that describes its respective source statement. The first record in the table
includes the length of the table in bytes and is followed by the address of the first opcode
generated for the compilation unit. The rest of the table consists of a list of source statement
records. A source statement record consists of three parts: 1) a line number, 2) a position
within the source line, and 3) an opcode address. The line numbers are ordered starting with 1
from the beginning of the compilation unit.

The compiler has two ways to represent the position within the source line. It can either use the
number of characters from the beginning of the line to the beginning of the source statement or
use the special value `SRC_NO_POS` to indicate that the record refers to the entire line. This
feature is necessary for HLL's that allow multiple statements in a single line.

The address in each record describes the address of the first opcode generated for that source
statement minus the address of the first opcode generated for the compilation unit. That is, it
represents the offset into the compilation unit.

Some HLL's allow statements to extend over multiple lines. The record in such a case will refer
to the line containing the start of that particular statement.

There is no limitation on the order in which the records appear. They do not necessarily represent
the exact order in which the statements appear in the original source file. Additionally, it is
not required to have a record in the line number table for every single source statement in the
original source file.

To terminate the line number table, PODDS uses a record whose line number is 0 and whose address
describes the first opcode of the next compilation unit. This allows the debugger to understand
which opcodes are associated with the last statement in a compilation unit; a useful feature for
stepping out of functions.
	A debug data format provides a way for storing high-level source information about a program. This
	allows analysis software such as debuggers and profilers to form a relationship between the
	executable code and the original source code that generated it. This is the purpose of PODDS:
	Parrot Opcode Debug Data Serialization format.

	The full PODDS specification is quite lengthy. Therefore, this document is meant to serve as a
	quick introduction to PODDS. If you'd like to see the full specification, visit https:/
	gist.github.com/1133182.

	The most basic entity in PODDS is called a "Data Description Entity" or DDE. A DDE consists of a
	"class" that indicates what it describes and a list of "properties" that further describe the
	specific characteristics of the entity. Excluding the topmost DDE, a DDE will always be owned by
	a parent DDE and may or may not have any child or sibling DDE's.

	Examples of class names:

	CLASS_array_type
	CLASS_class_type
	CLASS_global_var
	CLASS_lex_block
	CLASS_local_var
	CLASS_param
	CLASS_src_file
	CLASS_sub

	Properties always form a name/value pair. A value will always have one of the following forms:

	* address - points to some location in the program's address space
	* reference - refers to another DDE in the debug segment
	* constant - uninterpreted numerical data
	* block - uninterpreted data
	* string - a null-terminated series of zero or more bytes

	Examples of property names:

	PT_end_pc
	PT_lang
	PT_location
	PT_program
	PT_sibling
	PT_start_pc
	PT_start_scope

	There is no restriction on the order in which properties appear. To eliminate ambiguity, each
	property is unique and no more than one property of a given name may appear in a DDE.

	The ownership of DDE entries is represented by their physical ordering and use of the
	`PT_sibling` property. The value of this property is a reference to another DDE. If the DDE
	referred to is null, it represents the end of the sibling chain. Except for `CLASS_padding`, all
	DDE's are required to have the `PT_sibling` property. A DDE is owned by its physical predecessor
	(called the "parent") unless it is referenced by that physical predecessor with the `PT_sibling`
	property. You can think of this DDE as the first child of the predecessor. Children derived from
	a DDE form a chain of siblings.

	A symbolic debugger has to access PODDS data very frequently. Therefore, it is very important to
	consider how to decrease the amount of time needed to read and interpret debug data. This
	becomes quite difficult when a program object is defined outside the compilation unit where the
	debugee is stopped. To find the DDE associated with a program object, a debugger would have to
	run a very aggressive search through every DDE at the highest scope in each compilation unit.
	This can severally cripple the performance of the debugger.

	To combat this problem, a compiler has the option of providing two separate types of tables that
	provide information about the DDE's owned by a particular compilation unit: the public name table
	and the public address table.

	The "public name table" is a subsection of the debug segment consisting of records that contain
	variable-length entries. Each record describes the names of program objects described by the
	DDE's that are owned by a single compilation unit. Each record starts with a header that contains
	three important values: 1) the (non-inclusive) length of the entries for that record, 2) the
	offset of the compilation unit's DDE from the start of the debug segment, and 3) the size in
	bytes of the DDE describing that particular compilation unit. Following the header is a variable
	number of offset/name pairs. Each pair contains the offset from the start of the compilation unit
	entry that corresponds with the current record for the DDE for the given program object, followed
	by a string representing the object's name as found in its `PT_name` property. Each record is
	terminated by a null pair. In this way, a debugger can rapidly determine which compilation unit
	to search in order to find the DDE for a program object with a given name.

	The "public address table" is a subsection of the debug segment consisting of records that
	contain variable-length entries. Each record describes the section of the program's address
	space that contains the compilation unit. Each record starts with a header that contains two
	important values: the (non-inclusive) length of the entries for that record and the offset of
	the compilation unit's DDE from the start of the debug segment. Following the header is a
	variable number of pairs of "address range descriptors." Each one contains the starting address
	of the range followed by its length. Each record is terminated by a null pair. In this way, a
	debugger can rapidly determine which compilation unit to search in order to find the DDE for a
	program object with a given address.

	Associating source-level lines numbers with their respective generated opcodes makes it possible
	for a debugger user to specify addresses in relation to source statements. This makes single
	stepping much more easier.

	Each compilation unit DDE in the debug segment references a corresponding record in the line
	number table that describes its respective source statement. The first record in the table
	includes the length of the table in bytes and is followed by the address of the first opcode
	generated for the compilation unit. The rest of the table consists of a list of source statement
	records. A source statement record consists of three parts: 1) a line number, 2) a position
	within the source line, and 3) an opcode address. The line numbers are ordered starting with 1
	from the beginning of the compilation unit.

	The compiler has two ways to represent the position within the source line. It can either use the
	number of characters from the beginning of the line to the beginning of the source statement or
	use the special value `SRC_NO_POS` to indicate that the record refers to the entire line. This
	feature is necessary for HLL's that allow multiple statements in a single line.

	The address in each record describes the address of the first opcode generated for that source
	statement minus the address of the first opcode generated for the compilation unit. That is, it
	represents the offset into the compilation unit.

	Some HLL's allow statements to extend over multiple lines. The record in such a case will refer
	to the line containing the start of that particular statement.

	There is no limitation on the order in which the records appear. They do not necessarily represent
	the exact order in which the statements appear in the original source file. Additionally, it is
	not required to have a record in the line number table for every single source statement in the
	original source file.

	To terminate the line number table, PODDS uses a record whose line number is 0 and whose address
	describes the first opcode of the next compilation unit. This allows the debugger to understand
	which opcodes are associated with the last statement in a compilation unit; a useful feature for
	stepping out of functions.