yurydelendik/dwarf-wasm.md

## dwarf-wasm.md

      
    Raw
  

              dwarf-wasm.md
            
          
    Overview

This document describes how existing DWARF format can be extended and used with the WebAssembly binaries.
“[DWARF-SPEC]” references http://www.dwarfstd.org/doc/DWARF4.pdf
“[SOURCE-MAPS]” references WebAssembly/design#1051
“[WASM-LINKING]” references https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md
“[R-x.x]” references a requirements (see the Requirements section below).
Embedding DWARF data

The DWARF sections are embedded in the binary WASM files as custom section [R-2.1.4]. The name of the custom section is equal to the DWARF section name as defined in the specification, e.g. .debug_info or .debug_line. It will allow quickly scan and locate the section's data in a debugger or other tool.
The DWARF format is already compact [R-2.1.7]. Furthermore, it is possible to remove DWARF sections from the production binary WASM file and place them into a separate container without modifications [R-2.1.6].
See the full list of the sections and their relationship at [DWARF-SPEC] Appendix B.
Connection with WebAssembly code

The DWARF information can refer to an instruction pointer or location in memory. In addition to that, special WebAssembly primitives such as locals or globals need to be referenced by other methods.
Instruction Pointer

The instruction pointer (or PC) is currently defined as bytecode offset, starting from code section body.  The mapping instruction pointer to bytecode offset simplifies resolution between the call stack PC pointer and WebAssembly code and its debug information [R-2.2.2]. Many sections, such as .debug_info, .debug_line, etc., can refer to particular instruction by their bytecode offset, and it will be unique.
The .debug_line section allows mapping between bytecode offset and original source. The relationship between these location types is many-to-one: multiple bytecode offsets may correspond a single original source location [R-2.2.3].
The .debug_info section contains information about a compile unit, that is comprised of subprogram/functions entries. Individual function entry contains information about its parameters and variables. This information includes their types [R-2.6.3], the bindings in scopes at a certain location [R-2.6.2] and how to calculate a variable value using DWARF expression [R-2.6.4].
The inlined function is expressed inside the .debug_info function entries [R-2.4] and also contain a proper mapping in the .debug_line section.
Memory Address

The location in memory, e.g. expressions that point to static variables defined on a heap, has the same definition as in traditional architectures.
DWARF Expressions

WebAssembly does not have registers, which DWARF expressions normally use to calculate the values of original source variables. WebAssembly compilers will use operands stack, locals or globals to store these variables values. To refer locals and globals in DWARF expressions, the built-in language extension of the DWARF expression will be used. The special operator code will be chosen in the range between DW_OP_lo_user and DW_OP_hi_user (see [DWARF-SPEC] 7.1 Vendor Extensibility).
The above extension will allow reading values of locals or globals [R-2.6.4]. Compilers need to learn to track all mappings of the original source variables to operands stack locals, or globals (instead of native platform registers), calculate their lifetime and serialize that into .debug_info section.
Proposed DWARF expression extension for locals

The DW_OP_WASM_location with code 0xFC can be added to the DWARF expression language. This operation will have two arguments. The first one is a type of the location: 1 - locals, 2 - globals, 3 - operands stack. The second one is an index. S
Pilot work can be found at https://gist.github.com/yurydelendik/3242da58878ceb96ba778cf3f26d7c9a
Types information

The [DWARF-SPEC] has the ability to store types information, e.g. in the .debug_pubtypes or .debug_info sections, that can be used by a debugger to display or pretty print more complex values [R-2.5]. The type information can be used by a debugger (or its language server) to properly calculate and execute an expression provided by a user.
Usage by tools

Tools, other than a debugger, can consume and mutate the DWARF section information. As a general rule, a tool has to have knowledge about debug information associated with WASM binary file. The relocation sections (see [WASM-LINKING]) may assist with changing the structure of the code or data section, e.g. in case if some function where removed [R-2.1.8]. However, if a function body itself was mutated, a DWARF information needs to be updated or removed.
Conversion to source maps

The wasm source maps [SOURCE-MAPS] can be generated based on the .debug_line section information [R-2.2]. The .debug_line sections, when decoded, allows producing the mappings of wasm file bytecode offsets to the original source locations.
The source file names can be specified in the relative or absolute form. The web platform solutions prefer original sources be provided as text embedded into source maps or published relative to the source map file.
The DWARF sections can be discarded [R-2.1.5].
Requirements

The requirements list is based on https://fitzgen.github.io/wasm-debugging-capabilities/#requirements proposal (21 May 2018).


#
Description


2.1
General


2.1.1
Must be future extensible


2.1.2
Must be embedder agnostic


2.1.3
Must support querying static properties without running the debuggee program


2.1.4
Must be embeddable within the WebAssembly


2.1.5
Must be separable from the WebAssembly


2.1.6
Must be compact on disk and over the network


2.1.7
Should be compact in memory


2.1.8
Should be fast to consume, generate, and manipulate


2.1.9
Should support interpreted debuggees, where the interpreter is written in Wasm


2.2
Locations


2.2.1
Must support querying the original source location for a given generated code location


2.2.2
Must support querying the generated code location(s) for a given original source location


2.2.3
Must support enumerating all bidirectional mappings between original source and generated code locations


2.3
Source Text


2.3.1
Must support embedding the original source text


2.3.2
Must support referencing external source text


2.4
Inlined Functions


2.4.1
Should support querying the inline function frames that are logically on the stack at some generated code location


2.4.2
Should support enumerating the logical inlined function invocations within a physical function


2.5
Types


2.5.1
Should support describing scalar types


2.5.2
Should support describing compound types


2.5.3
Should support type-based pretty printing


2.6
Scopes and Bindings


2.6.1
Should support querying for the scope chain at a given generated code location


2.6.2
Should support enumerating all bindings within a scope


2.6.3
Should support querying a binding’s type


2.6.4
Should support describing a method to find the location of or reconstruct a binding’s value


2.7
Generic Functions and Monomorphizations


2.7.1
Should support querying whether a function is a monomorphization of a generic function
#	Description
2.1	General
2.1.1	Must be future extensible
2.1.2	Must be embedder agnostic
2.1.3	Must support querying static properties without running the debuggee program
2.1.4	Must be embeddable within the WebAssembly
2.1.5	Must be separable from the WebAssembly
2.1.6	Must be compact on disk and over the network
2.1.7	Should be compact in memory
2.1.8	Should be fast to consume, generate, and manipulate
2.1.9	Should support interpreted debuggees, where the interpreter is written in Wasm
2.2	Locations
2.2.1	Must support querying the original source location for a given generated code location
2.2.2	Must support querying the generated code location(s) for a given original source location
2.2.3	Must support enumerating all bidirectional mappings between original source and generated code locations
2.3	Source Text
2.3.1	Must support embedding the original source text
2.3.2	Must support referencing external source text
2.4	Inlined Functions
2.4.1	Should support querying the inline function frames that are logically on the stack at some generated code location
2.4.2	Should support enumerating the logical inlined function invocations within a physical function
2.5	Types
2.5.1	Should support describing scalar types
2.5.2	Should support describing compound types
2.5.3	Should support type-based pretty printing
2.6	Scopes and Bindings
2.6.1	Should support querying for the scope chain at a given generated code location
2.6.2	Should support enumerating all bindings within a scope
2.6.3	Should support querying a binding’s type
2.6.4	Should support describing a method to find the location of or reconstruct a binding’s value
2.7	Generic Functions and Monomorphizations
2.7.1	Should support querying whether a function is a monomorphization of a generic function