Skip to content

Instantly share code, notes, and snippets.

@Qix-
Last active December 11, 2020 16:21
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save Qix-/584242819f59af05b57717a9ca86140e to your computer and use it in GitHub Desktop.
6 years later and I still need to sit down and make this programming language
# Self-contained example for writing to stdout and to a file.
#
# Relies only on the standard assembler module (std.asm), which
# cannot be stubbed out here (as they are intrinsics). std.asm
# allows direct, architecture-specific emission of machine code
# instructions and access to registers.
#
# Also relies on the @platform, @arch, @effect, and @force_inline
# intrinsics, which cannot be (reasonably) stubbed out here.
#
# - @platform: conditionally enable the immediate element
# based on the target platform (some identifiers
# can match for multiple, e.g. `posix` always
# enables when `linux` would too).
#
# - @arch: conditionally enable the immediate element
# based on CPU architecture.
#
# - @effect: specifies that the immediate element produces
# one or more side effects, the special `unknown`
# effect specifying that the caller must specify
# effect at the call site (meant for e.g. syscall
# wrappers, as demonstrated below).
#
# - @force_inline: force-"pastes" the body of the immediate block
# where the callsite occurs instead of invoking a
# function.
#
# Lastly, it relies on the amorphous `str` (string) class, which
# has yet to be specified.
#
# Apart from these intrinsics, no other compiler feature or
# piece of the standard library is used.
#
# A few notes about syntax to better understand the following
# code:
#
# - @foo is a directive, which will either attach information
# to the element before which it is placed, or will modify
# the AST directly at compile time (including, but not limited
# to, removing the AST node altogether). All directives must
# be immediately and statically solvable, and (at least for now)
# directive declarations cannot themselves have directives applied
# to them.
#
# Directives may take parameters - each being passed to the directive
# as an AST node (which, in turn, includes a set of tokens). This allows
# arbitrary identifier passing to the directive instead of having to declare
# values beforehand - thus solving the problem of "new platform X" not otherwise
# being supported by the compiler until the compiler is modified.
#
# - `error FOO` raises an error, where `FOO` is an arbitrary constant value,
# scoped directly to the function that emits the error. Further, an error constant
# can be pre-declared for use among many methods using `use error FOO`.
#
# Error constants are referred to by name; the underlying numeric value associated
# with them is not meant to be exposed to the programmer. A list of all possible
# error codes can be enumerated for each function via `some_function.errors`, which
# is of type `error[]` (`error` is a rare case of keyword, contextually behaving as
# a typename or an operator, depending on its usage - e.g. `use error` is treated
# as a special case of `use`.
#
# - To handle errors, there is the oneshot version using `<statement> else <statement|block>`
# and the block version using `try <block> else <block>`.
#
# Within an error handler (`else <statement|block>`), using the keyword `error` as a statement,
# by itself with no argument, re-throws the existing error. This means that a union of all
# possibly produced error constants is added to the current function, too.
#
# The language would be able to determine frontiers where specific (or all) error codes are
# handled and would not propagate upward, even if another codepath re-throws the error.
# Such errors are NOT included in the function's error set.
#
# Finally, `else assert` can be used to terminate the program (see notes about `assert`
# below).
#
# - `assert` is a keyword. Used by itself (with no argument), it is the equivalent of C's
# `abort()`. With either one or two arguments, it checks a `bool` condition and either
# continues (if true) or aborts (if false). An optional second argument can be given a
# string to be emitted when the assertion fails.
#
# `assert` has the added benefit of being selectively static, meaning that it can be
# used to assert traits about types (especially when type variables (see below) are used).
# The example uses this feature to specify valid types of generic functions; similar to
# C++ templates, in cases where two overloads of a generic function have the same signature
# and one of the types is a type variable, assertions can be used to specify traits about
# the type that the overload allows. At resolve time, the singular remaining (non-asserted)
# overload is used; zero remaining overloads yields a compiler error about no valid overloads
# for the specified types, and more-than-one remaining overloads yields a compiler error about
# ambiguous overloads for the specified types.
#
# - `$T` denotes a type variable called `T`. All usages of type variables aside from
# their declarations must retain the `$` prefix. Type variables can be used multiple
# times in a function declaration to imply that the types must match.
#
# As mentioned previously, `assert` can be used to further specify the characteristics
# of the type. For example, if a method should take two parameters, where the first
# parameter can be of any sized signed integer, and the second parameter must be the
# unsigned equivalent of the first (same size, just unsigned), then it would be
# expressed using an `assert` like so:
#
# fn foo(a $A, b $B)
# assert $A.is_signed
# assert $B == $A.unsigned
#
# More trivial examples include making sure the integer is of sufficient size
# (`assert $A.width >= 32`) or to enforce other compositions of the type -
# for example, checking that $B is an array of $A (`assert $B == $A[]`).
#
# - `switch` statements have no `case` keyword. Further, cases do NOT fall-through
# unless:
#
# - The case comes directly after another case (the prior case having no body)
# - The `continue` statement is used, which jumps to the next case statement.
#
# Further, `else` is used in lieu of a "default" case.
#
# - By default, all declarations are module-private. Export them for visibility
# outside the module using the `pub` qualifier.
#
# - `for` is the only looping mechanism. It can be broken out of using `break`
# and forcibly iterated using `continue`. There are a number of syntaxes
# it accepts for both iteration as well as control flow (e.g. what a `while`
# loop is classically for). These syntaxes have not been solidified yet, so
# a bit of creativity was used in this example.
#
# - `as` performs type conversion; there is no implicit coersion in this language.
# If an expression is not already type $T, but you want to treat it as such,
# `<expr> as $T` must be used. Further, a conversion must exist for the type;
# many conversions are intrinsic, and conversions to an aliased type are
# automatic.
#
# There is an example of a conversion function below for the custom `Ostream`
# type and `str`. This isn't particularly useful for the example in particular
# but I wanted to explore how it'd be achieved.
#
# `as` also has a special meaning when used with `use`, which creates a symbol
# alias (similar to C++'s `using`, though without the need to forward type
# variables).
#
# use str as my_string_class
#
# - Variable declarations (including parameters) are reversed to C; the identifier
# comes first, followed by the type. For example, `foo str` specifies a variable
# "foo" with type "str".
#
# - Variables are immutable by default. Their types can be marked mutable using a
# bang postfix - e.g. `foo str!`. For implicit declarations, the bang postfix comes
# directly after the identifier - e.g. `foo! = some_string`.
#
# - Values are passed by-value by default. References to values can be made using
# the amp postfix - e.g. `foo str&`.
#
# - Arrays are declared as either fixed size or lazily sized using the `[n]` and `[]`
# type postfixes, respectively. For example, a variable that is an array of 5
# 32-bit unsigned integers would be declared as `foo u32[5]`. An array that
# takes the size of its immediate initialization value omits the size specifier -
# e.g. `foo u32[] = [ 1, 2, 3, 4, 5 ]`.
#
# - Combinations of arrays, references and bangs (mutable specifiers) can be used,
# though note that like most languages the order matters.
#
# - `str!&` is a reference to a mutable string; the reference cannot be re-assigned.
# - `str&!` is a reference to an immutable string; the reference can be re-assigned.
# - `str&` is a reference to an immutable string; the reference cannot be re-assigned.
# - `str!&!` is a reference to a mutable string; the reference can be re-assigned.
#
# The postfix notation was chosen as it is the most readable given the terseness of the
# language and the possible composed types - there are no prefix tokens to confuse the
# order of reading, and each successive token further specifies the existing type.
# This design was on purpose.
#
# As a case study:
#
# `str![]&!&` is a non-reassignable reference (&) to a re-assignable reference (&!)
# to an immutable array ([]) of mutable strings (str!).
#
# To allow the individual array elements to be re-assigned (e.g. `foo[2] = new_string`),
# the array itself would need to be marked as mutable: `str![]!&!&`.
#
# To disallow individual strings be modified, remove the bang from the `str`: `str[]&!&`.
#
# To re-assign the second-level reference (since the first-level reference is immutable),
# lower the reference and assign it: `&foo = some_array_of_strings`.
#
# - Mutable types can be demoted to immutable types; immutable types cannot be promoted to mutable
# types. This is one of the few cases of implicit type coersion.
#
# - (Not demonstrated in this example) blocks can be marked `@pure`, indicating that they are not
# allowed to emit side effects. Many functions are automatically detected as pure and are optimized
# as such, but the directive can be used to enforce this.
#
# - `self` is a special identifier reserved for the first argument of a function (optionally) -
# in which case, any expressions of `type(self)` can be dot-accessed to retrieve that function,
# automatically binding the L-value as the `self` parameter. This works similarly to Python's
# class methods, but the functions can be declared anywhere.
#
# type Foo
# fn do_bar(self Foo)
# sys.io.out.print(`hello bar\n`)
# fn main()
# f = Foo{}
# f.do_bar() # prints "hello bar"
# do_bar(f) # prints "hello bar" (equivalent)
#
# - `...` is a special token that can be used as the LAST parameter in a function
# declaration. It denotes a parameter pack - a statically determined pack of
# zero or more parameters of arbitrary types.
#
# The pack token can be used in many places, including in invocation call sites
# as parameters (seen in this example in the `print` function to forward parameters
# to `try_print`).
#
# Certain syntax involving `...` as a postfix operator are planned but not yet
# solidified. Such cases will behave similarly to C++'s parameter pack postfix
# operator within templates.
#
# - Along with `str`, a few other type aliases are automatically included into the root
# module scope: `isize`, `usize`, and `bool`.
#
# The first two are aliases to signed and unsigned integral types matching
# the size of the CPU's general purpose registers or address size (whatever
# is fitting and most performant for the platform/architecture in question).
# They behave similarly to C's `int`.
#
# `bool` is its own type equivalent to the statement `type bool u1`. The
# language constants `true` and `false` are defined as `false bool = 0u1`
# and `true bool = 1u1`. Since it is its own type, overloads can granularly
# overload both unsigned integral types and boolean types separately.
#
# - Overloads of similarly classed integral types (e.g. `u16` and `u32`)
# do NOT conflict - passing a value of type `u64` in this case would
# result in a compilation error. Using `as` to widen or narrow an integer
# is acceptable here, or a type variable + assertion can be used in cases
# where variable integral widths are allowed (especially useful in cases
# of serializers):
#
# fn write(n $T)
# assert $T.width <= 16
# n16 = n as u16
# fn write(n $T)
# assert $T.width > 16 and $T.width <= 32
# n32 = n as u32
@platform(posix)
fn errno(r $T)
assert $T.is_unsigned
switch r
1: error EPERM
2: error ENOENT
3: error ESRCH
4: error EINTER
# ... ad nauseum.
else
assert
@arch(amd64)
@platform(linux)
@effect(unknown) # force caller to specify effect
pub fn syscall(nr usize, ...) usize
use std.asm.amd64 as X
X.util.set_args(...)
X.syscall(nr)
if (X.rax as i64) < 0
errno(-(X.rax as i64) as usize)
ret X.rax
@arch(x86)
@platform(linux)
@effect(unknown) # force caller to specify effect
pub fn syscall(nr usize, ...) usize
use std.asm.x86 as X
X.util.set_args(nr, ...)
X.int(0x80)
if (X.eax as i32) < 0
errno(-(X.eax as i32) as usize)
ret X.eax
@platform(linux)
pub fn exit(status isize)
@arch(x86) nr = 1
@arch(amd64) nr = 60
@effect(terminate) syscall(nr)
# tell compiler we'll never reach here.
assert
@platform(linux)
pub fn open(filename &str, flags usize, mode usize) isize
@arch(x86) nr = 5
@arch(amd64) nr = 2
r = @effect(file_open) syscall(nr, filename.c_str(), flags, mode)
ret r as isize
@platform(linux)
pub fn write(fd isize, buf u8[]&, count usize) usize
@arch(x86) nr = 4
@arch(amd64) nr = 1
r = @effect(file_write) syscall(nr, fd, buf.addr, count)
ret r
@platform(linux)
pub fn close(fd isize)
@arch(x86) nr = 6
@arch(amd64) nr = 3
@effect(file_close) syscall(nr, fd)
type Ostream
@platform(linux)
fd isize
@platform(linux)
enum Flags
WRITE = 1
CREATE = 64
@platform(linux)
pub fn Ostream.open(filename str&, flags Ostream.Flags, mode usize) Ostream
fd = open(filename, 1 | flags, mode) else error
ret Ostream { fd }
@platform(linux)
pub fn try_print(self Ostream&, v $T) Ostream&
buf = to_string(v)
write(self.fd, buf, buf.length)
ret self
@force_inline
pub fn print(...) $T
# Given the use-case here that we should assume
# the success path, especially in CLI applications,
# we perform an assert on the success of try_print
# to keep code cleaner when using .print().
#
# Applications that want to actually test for, and
# handle/recover from, a failure to write to the stream
# should call .try_print().
ret try_print(...) else assert
pub fn to_string(v $T) str
assert $T.is_signed
ret [
v < 0 then `-` else ``,
to_string(abs(v) as $T.unsigned)
].join()
pub fn to_string(v $T!) str
assert $T.is_unsigned
if v == 0
ret `0`
digits = std.math.log10(v)
result str = str.with_capacity(digits)
for (digits-1)...0 as i
result[i] = 0x30 + (v % 10)
v //= 10
assert v == 0
ret result
@force_inline
pub fn to_string(v str&) str&
ret v
pub fn main() u8
out = Ostream{ 1 }
file = Ostream.open(`/tmp/foo`, 0, 8x755) else
out.print(`failed to open file: {error}\n`)
ret 1
defer file.close() else
out.print(`warning: failed to close file: {error}\n`)
out.print(`opened file /tmp/foo for writing\n`) else assert
for 0..10 as i
file.print(`loop iteration {i}\n`) else
out.print(`failed to write to file: {error}\n`)
ret 1
out.print(`done!\n`)
ret 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment