Skip to content

Instantly share code, notes, and snippets.

@mlugg
Last active March 12, 2024 14:20
Show Gist options
  • Save mlugg/d0dcb0025dbfd7145b66d0ada6060024 to your computer and use it in GitHub Desktop.
Save mlugg/d0dcb0025dbfd7145b66d0ada6060024 to your computer and use it in GitHub Desktop.
Zig Wars: Episode III -- Revenge of the Decl

The Problem

Decl has two main responsibilities today.

The first is to act as the "subject" of semantic analysis for anything analyzed in a comptime context. For instance, when analyzing the value of a container-level const, that declaration's Decl is the "owner" of that Sema; errors are marked on it, source locations resolved relative to it, etc. This is also where type owner decls come from - we need some context in which to perform type resolution, so we need a Decl associated with the type itself.

The second is to represent a globally named and/or addressable value. For instance, container-level consts are named and addressable. More interestingly, so are generic instantiations - you can't take their address in Zig today, but they sure as hell have one, and same with name. This is where function instance owner decls come from.

Interestingly, above, we found that owner decls for types and owner decls for function instances are fulfiling two totally separate purposes! To me, that's a pretty clear indication that those purposes should be separated.

Proposed Solution

Split Decl into two types: Csu and Decl.

Csu stands for "Comptime Sema Unit" (thanks InK for pointing me towards this nomenclature!). It is the subject of all comptime semantic analysis (i.e. all analysis which is not of a runtime function body). It appears as follows.

pub const Csu = struct {
    /// What is being analyzed?
    source: Source,
    analysis: enum {
        not_analyzed,
        in_progress,
        file_failure,
        dependency_failure,
        sema_failure,
        complete,
        // Note the lack of `codegen_failure` - `Csu` doesn't care about your codegen!
    },
    /// The file that error messages from this `Csu` should appear within.
    src_file: *File,
    /// The source location that error messages from this `Csu` are relative to.
    src_node: Ast.Node.Index,
    
    pub const Source = union(enum) {
        /// This `Csu` corresponds to a `usingnamespace` declaration.
        @"usingnamespace": TrackedInst.Index,
        /// This `Csu` corresponds to a `comptime` declaration.
        @"comptime": TrackedInst.Index,
        /// This `Csu` corresponds to a `test` declaration.
        @"test": TrackedInst.Index,
        /// This `Csu` corresponds to a non-test named declaration.
        decl: TrackedInst.Index,
        /// This `Csu` is the context for resolution of this type.
        type: InternPool.Index,
    };
    
    // [...declarations...]
};

The state here is, as you can see, pretty minimal. It's just "what are we analyzing" (source), "what's the state of that analysis" (analysis), and "where should compile errors point" (src_file+src_node).

The new Decl, then, is a subset of the old one. It represents a globally named addressable value (if anyone can think of a better name than Decl, I'm all ears!). It looks like this:

pub const Decl = struct {
    name: InternPool.NullTerminatedString,
    /// Is `name` already fully qualified?
    name_fully_qualified: bool = false,
    /// The namespace which this `Decl` was created within.
    /// If this is the root `Decl` of a file, this is the namespace of the file's root struct type itself.
    src_namespace: Namespace.Index,
    /// Absolute line number corresponding to the source location of this declaration.
    /// Used by backends to generate debug info.
    src_line: u32,
    /// Set if this `Decl` corresponds to a source declaration marked `export`.
    /// If so, it will be exported from the `Zcu` under `name` once it `has_tv`.
    is_exported: bool,
    /// Whether `ty` through `addrspace` are populated.
    has_tv: bool,
    // Once we fix up comptime-mutable memory these two will just be `val: InternPool.Index`.
    ty: Type,
    val: Value,
    @"linksection": InternPool.OptionalNullTerminatedString,
    alignment: Alignment,
    @"addrspace": std.builtin.AddressSpace,
    /// If this `Decl` undergoes analysis, it's in this `Csu`.
    /// If this is `.none`, then `has_tv` must be always set, e.g. perhaps this is a generic function instance.
    csu: Csu.OptionalIndex,
};

Note that is_pub is gone. This state is moved into Namespace, which will be as follows:

pub const Namespace = struct {
    parent: OptionalIndex,
    file_scope: *File,
    /// Will be a struct, enum, union or opaque.
    ty: InternPool.Index,
    /// All names in this namespace which are marked `pub`.
    pub_decls: std.ArrayHashMapUnmanaged(Decl.Index, void, DeclContext, true) = .{},
    /// All names in this namespace which are *not* marked `pub`.
    priv_decls: std.ArrayHashMapUnmanaged(Decl.Index, void, DeclContext, true) = .{},
    /// All `usingnamespace` declarations in this namespace which are marked `pub`.
    pub_usingnamespaces: std.ArrayListUnmanaged(Csu.Index) = .{},
    /// All `usingnamespace` declarations in this namespace which are *not* marked `pub`.
    priv_usingnamespaces: std.ArrayListUnmanaged(Csu.Index) = .{},
    /// Other things within this namespace - `comptime` and `test` declarations.
    /// These are basically just here for incremental compilation reasons.
    other_members: std.ArrayListUnmanaged(Csu.Index) = .{},
    
    // [...declarations...]
};

FAQs

Q: How do namespace types work?
A: They have a Csu, but not a Decl. The type's name is stored straight in the InternPool, since this is a property of the type itself.

Q: How do generic function instances work?
A: They have a Decl, but not a Csu. The instance is named and has an address, but no comptime semantic analysis is necessary within the function instance itself.

Q: How does reference_table work?
A: References map between any Sema units (currently named Depender, to be renamed to SemaUnit; either a runtime function or a Csu). Note that for reasons related to incremental compilation I'm planning to rework how the reference table works a little anyway.

Q: How does tracking of Decl and Csu across incremental updates work?
A: In Zcu.scanNamespace, we try to match source declarations up with the old namespace members - that's why other_members exists! We do it for everything based on the TrackedInst.Index, which is in the Csu of the namespace's declarations.

@mlugg
Copy link
Author

mlugg commented Mar 12, 2024

Okay, I've settled down the names and structures a bit more, so will paste them below. Note that I've given efficient DOD representations of the types as e.g. Cau.Repr - these are trivially serialized and memory-efficient. There should be no padding, so we can store them in plain ArrayLists in the InternPool, and have accessor methods (akin to InternPool.loadStructType etc) to load the more elegant representation. You might notice that src_namespace is in both Cau and Nav - this is necessary because they're relevant to both (see doc comments), but for memory efficiency, Nav.Repr doesn't store it unnecessarily.

/// Every semantic analysis performed by an instance of Sema is analyzing the
/// body of one `AnalUnit` (analysis unit). The unit is one of the following:
/// * A runtime function body, represented by the function's `InternPool.Index`
/// * A comptime body, represented by a `Cau` (comptime analysis unit)
pub const AnalUnit = packed struct(u32) {
    kind: enum { cau, func },
    index: u31,

    pub fn wrap(unit: Unwrapped) AnalUnit {
        return switch (unit) {
            .cau => |i| .{ .kind = .cau, .index = @intCast(@intFromEnum(i)) },
            .func => |i| .{ .kind = .func, .index = @intCast(@intFromEnum(i)) },
        };
    }

    pub fn unwrap(unit: AnalUnit) Unwrapped {
        return switch (unit.kind) {
            .cau => .{ .cau = @enumFromInt(unit.index) },
            .func => .{ .func = @enumFromInt(unit.index) },
        };
    }

    pub const Unwrapped = union(enum) {
        cau: Cau.Index,
        func: InternPool.Index,
    };
};

/// Comptime Analysis Unit. The owner of a single unit of comptime semantic analysis.
/// * source declarations requiring comptime analysis (`const`, `var`, `fn`, `comptime`, `usingnamespace`, `test`)
///   (TODO if we change ZIR to special-case test functions a bit more, they won't need this!)
/// * namespace types (used for analysis)
pub const Cau = struct {
    /// The ZIR instruction being analyzed. This is either a `declaration` instruction
    /// or a type declaration (`struct_decl` etc) instruction.
    zir_index: TrackedInst.Index,
    /// The namespace which this `Cau` should be analyzed within.
    src_namespace: Namespace.Index,
    /// The absolute AST node in the file of `src_namespace` that error messages from this `Cau` are relative to.
    src_node: Ast.Node.Index,
    /// The status of semantic analysis of this `Cau`.
    status: enum {
        /// Semantic analysis has not yet been attempted.
        not_analyzed,
        /// Semantic analysis is currently in progress.
        in_progress,
        /// Semantic analysis of this `Cau` failed, either directly or due to a transitive failure.
        sema_failure,
        /// Semantic analysis has succeeded.
        success,
    },

    /// In-memory representation of a `Cau`. Trivially serializable.
    /// 12 bytes.
    const Repr = packed struct {
        zir_index: TrackedInst.Index,
        src_namespace: Namespace.Index,
        src_node: u30,
        status: enum { not_analyzed, in_progress, sema_failure, success },
    };
};

/// Named Addressable Value. A global value with a name and a fixed address (runtime or comptime).
/// * value-having source declarations (`const`, `var`, `fn`, `test`)
/// * generic function instances
/// * `@export`s and `@extern`s
/// * `comptime var`s (TODO DON'T DO THIS)
pub const Nav = struct {
    name: InternPool.NullTerminatedString,
    /// The namespace this value was created by. If this is the root `Nav` of a file,
    /// this is instead the namespace of the file itself.
    /// Used to compute FQNs and to get the source file of this `Nav`.
    src_namespace: Namespace.Index,
    /// Absolute line number corresponding to the source location of this declaration.
    /// Used by backends to generate debug info.
    src_line: u32,
    /// Set if this `Nav` corresponds to a source declaration marked `export`.
    /// If so, it will be exported from the `Zcu` under `name` once it `has_tv`.
    is_exported: bool,
    status: enum {
        /// This `Nav` is pending successful semantic analysis through `csu`.
        /// `ty` through `addrspace` are not populated.
        unresolved,
        /// This `Nav` has a value, but code generation failed.
        /// `ty` through `addrspace` are populated.
        codegen_failed,
        /// This `Nav` has a value and code generation has succeeded.
        /// `ty` through `addrspace` are populated.
        complete,
    },
    // Once we fix up comptime-mutable memory these two will just be `val: InternPool.Index`.
    ty: Type,
    val: Value,
    @"linksection": InternPool.OptionalNullTerminatedString,
    alignment: Alignment,
    @"addrspace": std.builtin.AddressSpace,
    /// If this `Nav` undergoes semantic analysis, it's in this `Cau`.
    csu: Cau.Index.Optional,

    /// In-memory representation of a `Nav`. Trivially serializable.
    /// 28 bytes.
    /// Note that this repr is only applicable once we fix up comptime-mutable memory.
    const Repr = extern struct {
        name: InternPool.NullTerminatedString,
        src_namespace: Namespace.Index,
        src_line: u32,
        val: InternPool.Index,
        @"linksection": InternPool.OptionalNullTerminatedString,
        flags: packed struct(u32) {
            status: enum { unresolved, codegen_failed, complete },
            is_exported: bool,
            @"addrspace": std.builtin.AddressSpace,
            alignment: Alignment,
            has_csu: bool,
            _: u17,
        },
        /// If `flags.has_csu`, this is a `Cau.Index`, and `src_namespace` is fetched from that `Cau`.
        /// Otherwise, this is a `Namespace.Index`.
        csu_or_src_namespace: enum(u32) {_},
    };
};

pub const Namespace = struct {
    parent: Index.Optional,
    /// Will be a struct, enum, union or opaque.
    ty: InternPool.Index,
    /// All names in this namespace which are marked `pub`.
    pub_decls: std.ArrayHashMapUnmanaged(Nav.Index, void, DeclContext, true) = .{},
    /// All names in this namespace which are *not* marked `pub`.
    priv_decls: std.ArrayHashMapUnmanaged(Nav.Index, void, DeclContext, true) = .{},
    /// All `usingnamespace` declarations in this namespace which are marked `pub`.
    pub_usingnamespaces: std.ArrayListUnmanaged(Cau.Index) = .{},
    /// All `usingnamespace` declarations in this namespace which are *not* marked `pub`.
    priv_usingnamespaces: std.ArrayListUnmanaged(Cau.Index) = .{},
    /// These are required by incremental compilation to detect which functions are referenced.
    test_decls: std.ArrayListUnmanaged(Nav.Index) = .{},
    /// These are required by incremental compilation to detect which `Cau`s are referenced.
    comptime_decls: std.ArrayListUnmanaged(Cau.Index) = .{},
};

@mlugg
Copy link
Author

mlugg commented Mar 12, 2024

Oh, and I think it's definitely good to move away from the name Decl altogether. The problem with the name is that neither a Cau, nor a Nav, nor a status-quo Decl has a perfect correspondence with source declarations. This can make the terminology a bit ambiguous when making statements about "decls"; you have to be careful to say "Decl" or "declaration".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment