Skip to content

Instantly share code, notes, and snippets.

@KaneRoot
Forked from lobre/zig_type_system.md
Created December 27, 2022 18:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KaneRoot/c177be03091935d564f8e249d6e7ff13 to your computer and use it in GitHub Desktop.
Save KaneRoot/c177be03091935d564f8e249d6e7ff13 to your computer and use it in GitHub Desktop.
Zig type system illustrated using ascii diagrams

Zig Type System

Zig aims to be a simple language. It is not easy to define what simple exactly means, but zig is also a low-level programming language that aims for c-compatibility. To reach this goal, it needs good semantics in its type system so that developers have a complete toolbox to manipulate data.

So types in zig are composable, but this can become rapidly overwhelming. See those examples. Are you able to understand them at a glance, as soon as you read them?

*const ?u8
?*const u8
*const [2]u8
[]?u8
?[]u8
[*]u8
[*:0]u8
*[]const u8
*[]*const ?u8

They seemed complex to me, even if Loris Cro helped a lot in the following video.

https://www.youtube.com/watch?v=VgjRyaRTH6E

So when I don’t understand something, I like to draw representations to illustrate the concepts. It is what I do for example to understand how commits relate to each other in git.

So here is my take on the zig type system. Feel free to comment if anything is wrong or unclear.

Types

It all starts with boxes.

┌───┐
│   │
└───┘

u8 and u16 are examples of box sizes and are called types.

┌───┐     ┌─────┐      ┌─┐
│   │ u8  │     │ u16  │ │ u2
└───┘     └─────┘      └─┘

There are many other types in zig, but we will use u8 in this document to illustrate the concepts. In the end, types are just representations of different things of different sizes.

Variables

A variable is a named box having a type and a value.

 ┌───┐
a┤ 1 │   var a: u8 = 1;
 └───┘

Here, the box named a of type u8 holds the value 1.

Constants

A constant is a variable that cannot change over time. Imagine a variable box, but that looks like a jail because it cannot be opened/changed anymore after initialization.

 ┌┰─┰┐
a┤┃1┃│   const a: u8 = 1;
 └┸─┸┘

Optionals

An optional variable can hold either a value or be null. The box accomodates both and is represented here with a vertical split. The top part represent the variable when it is filled, and the bottom part is when it is null.

 ┌───┐                        ┌───┐ 
 │ 1 │                        │   │
a┼─?─┤   var a: ?u8 = 1;     a┼─?─┤   var a: ?u8 = null;
 │   │                        │ ∅ │
 └───┘                        └───┘

Arrays

An array holds multiple values of the same type. The boxes are visually represented attached to each other.

 ┌───┐┌───┐┌───┐
a┤ 1 ├┤ 2 ├┤ 3 │   var a: [3]u8 = [_]u8{ 1, 2, 3 };
 └───┘└───┘└───┘

Values in the array can be accessed using indexes starting at 0. For example, a[0] has the value 1 and can be changed with a[0] = 2.

Here is an array of optional u8 values.

 ┌───┐┌───┐┌───┐
 │ 1 ││   ││ 3 │
a┼─?─┼┼─?─┼┼─?─┤   var a: [3]?u8 = [_]?u8{ 1, null, 3 };
 │   ││ ∅ ││   │
 └───┘└───┘└───┘

An array can also be constant. Elements cannot be changed.

 ┌┰─┰┐┌┰─┰┐┌┰─┰┐
a┤┃1┃├┤┃2┃├┤┃3┃│   const a: [3]u8 = [_]u8{ 1, 2, 3 };
 └┸─┸┘└┸─┸┘└┸─┸┘

An array can be zero-terminated. It means there is an additional value 0 at the index of its length. And the compiler will let you access that element instead of returning an out-of-bands error.

 ┌───┐┌───┐┌───┐┌───┐
a┤ 1 ├┤ 2 ├┤ 3 ├┤ 0 │   var a: [3:0]u8 = [_:0]u8{ 1, 2, 3 };
 └───┘└───┘└───┘└───┘
                        std.debug.print("{}\n", .{a[3]}); // correct and prints 0

This zero value can be changed to any other sentinel value ([3:2]u8 or even [3:'f']u8 for example).

Pointers

A pointer is an address to another variable. In the diagram here, & represents an address for simplicity but it is normally a real memory address. The arrow means that the address stored in the pointer is the one of the pointed variable.

       ┌───┐
      a┤ 1 │   var a: u8 = 1;
       └─▲─┘
 ┌───┐   │
p┤ & ├───┘     var p: *u8 = &a; // &a is the address of a
 └───┘
               p.* = 2; // the value of a can be changed through p using the * keyword

If a pointer is constant, it cannot change but the pointed variable can.

       ┌───┐
      a┤ 1 │   var a: u8 = 1;
       └─▲─┘
 ┌┰─┰┐   │
p┤┃&┃├───┘     const p: *u8 = &a;
 └┸─┸┘
               p.* = 2; // changing a through p is correct
               p = &c; // changing p directly is incorrect

A pointer can instead point to a constant instead of a variable.

       ┌┰─┰┐
      a┤┃1┃│   const a: u8 = 1;
       └┸▲┸┘
 ┌───┐   │
p┤ & ├───┘     var p: *const u8 = &a;
 └───┘
               p.* = 2; // incorrect
               p = &c; // correct

A pointer to a variable can be coerced to a pointer to a constant, but not the opposite.

       ┌───┐                                           ┌┰─┰┐
      a┤ 1 │   var a: u8 = 1;                         a┤┃1┃│   const a: u8 = 1;
       └─▲─┘                                           └┸▲┸┘
 ┌───┐   │                                       ┌───┐   │
p┤ & ├───┘     var p: *u8 = &a;                 p┤ & ├───┘     var p: *const u8 = &a;
 └───┘                                           └───┘
               var p2: *const u8 = p; // correct               var p2: *u8 = p; // incorrect

A pointer can point to an optional variable.

       ┌───┐
       │   │
      a┼─?─┤   var a: ?u8 = null;
       │ ∅ │
       └─▲─┘
 ┌───┐   │
p┤ & ├───┘     var p: *?u8 = &a;
 └───┘

Or a pointer can itself be optional.

       ┌───┐
      a┤ 1 │   var a: u8 = 1;
       └─▲─┘
 ┌───┐   │
 │ & ├───┘     var p: ?*u8 = &a;
p┼─?─┤
 │   │         p = null; // correct
 └───┘

A pointer can point to a constant value that is optional.

       ┌┰─┰┐
       │┃2┃│
      a┼╂?╂┤   const a: ?u8 = 2;
       │┃ ┃│
       └┸▲┸┘
 ┌───┐   │
p┤ & ├───┘     var p: *const ?u8 = &a;
 └───┘

A optional pointer can also point to a constant.

       ┌┰─┰┐
      a┤┃1┃│   const a: u8 = 1;
       └┸▲┸┘
 ┌───┐   │
 │ & ├───┘     var p: ?*const u8 = &a;
p┼─?─┤
 │   │
 └───┘

A pointer can point to an array. This one points to an array of u8.

       ┌───┐┌───┐┌───┐
      a┤ 1 ├┤ 2 ├┤ 3 │   var a: [3]u8 = [_]u8{ 1, 2, 3 };
       └─▲─┘└───┘└───┘
 ┌───┐   │
p┤ & ├───┘               var p: *[3]u8 = &a;
 └───┘

This one points to a constant array of u8.

       ┌┰─┰┐┌┰─┰┐┌┰─┰┐
      a┤┃1┃├┤┃2┃├┤┃3┃│   const a: [3]u8 = [_]u8{ 1, 2, 3 };
       └┸▲┸┘└┸─┸┘└┸─┸┘
 ┌───┐   │
p┤ & ├───┘               var p: *const [3]u8 = &a;
 └───┘

A pointer can point to an unknown number of u8.

       ┌───┐┌───┐┌───┐┌───┐
      a┤ 1 ├┤ 2 ├┤ … ├┤ 5 │   var a: [5]u8 = [_]u8{ 1, 2, 3, 4, 5 };
       └─▲─┘└───┘└───┘└───┘
 ┌───┐   │
p┤ & ├───┘                    var p: [*]u8 = &a;
 └───┘

The advantage over a regular pointer to u8 (*u8) is that it says there can be many u8 at this address. The system just does not know how many.

A pointer can also point to an unknown number but zero-terminated of u8 values.

       ┌───┐┌───┐┌───┐┌───┐┌───┐
      a┤ 1 ├┤ 2 ├┤ … ├┤ 5 ├┤ 0 │   var a: [5:0]u8 = [_:0]u8{ 1, 2, 3, 4, 5 };
       └─▲─┘└───┘└───┘└───┘└───┘
 ┌───┐   │
p┤ & ├───┘                         var p: [*:0]u8 = &a;
 └───┘

At the opposite, see an array of pointers to u8 values.

 ┌───┐  ┌───┐  ┌───┐   var a: u8 = 1;
a┤ 1 │ b┤ 2 │ c┤ 3 │   var b: u8 = 2;
 └─▲─┘  └─▲─┘  └─▲─┘   var c: u8 = 3;
   │    ┌─┘      │
   │    │    ┌───┘
 ┌─┴─┐┌─┴─┐┌─┴─┐
p┤ & ├┤ & ├┤ & │       var p: [3]*u8 = [_]*u8{ &a, &b, &c };
 └───┘└───┘└───┘

And to finish with pointers, they can also point to other pointers.

              ┌───┐
             a┤ 1 │   var a: u8 = 1;
              └─▲─┘
        ┌───┐   │
      p1┤ & ├───┘     var p1: *u8 = &a;
        └─▲─┘
  ┌───┐   │
p2┤ & ├───┘           var p2: **u8 = &p1;
  └───┘

Slices

A slice is a pointer to an array with a length known at runtime. In the slice box, there is the address of the first element of the array represented by & and the length of the slice after the colon character.

A slice can be initiated using a pointer to the backing array (the compiler knows how to coerce them), and the length will be defined at runtime to the length of the array. This means we can always coerce a pointer to an array into a slice, but not the opposite. That’s because the compiler won’t know the length of the array from the slice at compile time.

         ┌───┐┌───┐┌───┐
        a┤ a ├┤ b ├┤ c │   var a: [3]u8 = [_]u8{ 'a', 'b', 'c' };
         └─[─┘└───┘└─]─┘
 ┌───┐     │
p┤ & ├─────┤               var p: *[3]u8 = &a;
 └───┘     │
 ┌─────┐   │
s┤ &:3 ├───┘               var s: []u8 = &a;      // directly pointing to a
 └─────┘                   var s: []u8 = p;       // or assigned to p
                           var s: []u8 = a[0..3]; // or from a range of the array

A zero-terminated slice guarantees that a zero value exists at the element indexed by the length.

         ┌───┐┌───┐┌───┐┌───┐
        a┤ a ├┤ b ├┤ c ├┤ 0 │   var a: [3:0]u8 = [_:0]u8{ 'a', 'b', 'c' };
         └─[─┘└───┘└─]─┘└───┘
 ┌─────┐   │
s┤ &:3 ├───┘                    var s: [:0]u8 = &a;
 └─────┘

A slice can be optional.

         ┌───┐┌───┐┌───┐┌───┐
        a┤ a ├┤ b ├┤ c ├┤ d │   var a: [4]u8 = [_]u8{ 'a', 'b', 'c', 'd' };
         └───┘└─[─┘└───┘└─]─┘
 ┌─────┐        │
 │ &:3 ├────────┘               var s: ?[]u8 = &a;
s┼──?──┤
 │     │                        s = null; // correct
 └─────┘

Now, here is a slice of constant u8 values.

         ┌┰─┰┐┌┰─┰┐┌┰─┰┐
        a┤┃1┃├┤┃2┃├┤┃3┃│   const a: [3]u8 = [_]u8{ 1, 2, 3 };
         └┸[┸┘└┸─┸┘└┸]┸┘
 ┌─────┐   │
s┤ &:3 ├───┘               var s: []const u8 = &a;
 └─────┘

A string litteral is a zero-terminated constant known at comptime that is stored in the binary.

         ┌┰─┰┐┌┰─┰┐┌┰─┰┐┌┰─┰┐┌┰─┰┐┌┰─┰┐
         │┃h┃├┤┃e┃├┤┃l┃├┤┃l┃├┤┃o┃├┤┃0┃│
         └┸[┸┘└┸─┸┘└┸─┸┘└┸─┸┘└┸]┸┘└┸─┸┘
 ┌───┐     │
p┤ & ├─────┤        var p: *const [5:0]u8 = "hello";
 └───┘     │
 ┌─────┐   │
s┤ &:5 ├───┘        var s: []const u8 = p; // correct
 └─────┘

And to add a level of indirection, here is a slice of pointers pointing to constant u8 values.

         ┌┰─┰┐  ┌┰─┰┐  ┌┰─┰┐   const a: u8 = 1;
        a┤┃1┃│ b┤┃2┃│ c┤┃3┃│   const b: u8 = 2;
         └┸▲┸┘  └┸▲┸┘  └┸▲┸┘   const c: u8 = 3;
           │    ┌─┘      │
           │    │    ┌───┘
         ┌─┴─┐┌─┴─┐┌─┴─┐
        p┤ & ├┤ & ├┤ & │       var p: [3]*const u8 = [_]*u8{ &a, &b, &c };
         └─[─┘└───┘└─]─┘
 ┌─────┐   │
s┤ &:3 ├───┘                   var s: []*const u8 = p[0..p.len]; // s.len is 3
 └─────┘

Composability

After all, the type system of zig is just composing those previous concepts. And drawing the representation can help understanding what is going on behind the scenes. I cannot draw all the possible combinations as there is a lot of them, but here is a last one for fun that is more complex.

               ┌┰─┰┐  ┌┰─┰┐  ┌┰─┰┐
               │┃1┃│  │┃ ┃│  │┃3┃│   const a: ?u8 = 1;
              a┼╂?╂┤ b┼╂?╂┤ c┼╂?╂┤   const b: ?u8 = null;
               │┃ ┃│  │┃∅┃│  │┃ ┃│   const c: ?u8 = 3;
               └┸▲┸┘  └┸▲┸┘  └┸▲┸┘
                 │    ┌─┘  ┌───┘
               ┌─┴─┐┌─┴─┐┌─┴─┐
              a┤ & ├┤ & ├┤ & │       var d: [3]*const ?u8 = [_]*const ?u8{ &a, &b, &c };
               └─[─┘└───┘└─]─┘
       ┌─────┐   │
      s┤ &:5 ├───┘                   var s: []*const ?u8 = d[0..2];
       └──▲──┘
 ┌───┐    │
p┤ & ├────┘                          var p: *[]*const ?u8 = &s;
 └───┘

In case you did not guess, this is a pointer to slice of pointers to constant optional u8 values.

Feel free to find other combinations and try to draw them to improve your knowledge about the zig type system!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment