Skip to content

Instantly share code, notes, and snippets.

@Asmageddon
Last active August 29, 2015 14:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Asmageddon/f738824444bf20c83e6c to your computer and use it in GitHub Desktop.
Save Asmageddon/f738824444bf20c83e6c to your computer and use it in GitHub Desktop.
The type system of h3

Type system of h3(working name)

  1. Basic types:

Integer family:

Integer numbers. Includes:

  • Int - largest native signed integer type available on current architecture
  • Differently sized signed and unsigned integers
  • BigNum - infinite size signed integral number

All integral literals are BigNum, but are by default inferred to Int

Float family:

Real numbers. Includes:

  • Float - largest native floating point type available on current architecture
  • Differently sized finite-precision floating point types
  • Rational - an infinite precision real number

All floating point literals are Rational, but by default inferred to Float

Array family:

Dynamically and statically sized arrays. Includes:

  • Array<T: Type>
  • Array<T: Type, N: Int>

Pointers family:

A pointer represent a memory address of a piece of typed or untyped data.

Non-basic standard types:

  • String, WString, ByteString - They represent text, in various formats.
  • Map<K: Type, V: Type> - Represents a key-value mapping. It's actually a trait(roughly like interfaces) with a default implementation(a hash map)
  • Tuple<...> - Anonymous structures of arbitrary size, fields can be named.
  • Smart pointers, including unique, reference-counted, and garbage-collected ones.
  • Other types included in standard library, including various maps, sets, graphs, etc.

NOTE: Syntactic sugar exists for maps and arrays contained inside tuples. E.g. t: Tuple<Int, Array<Int>> = (1, 2, 3, 4, 5) is valid code. Primarily used for varargs and keyargs in procedure calls

  1. Procedures:

A procedure is a possibly-anonymous function that maps an input value(tuples for multiple params), into an output value

Procedures have access to the scope where they are defined, allowing nested functions. Trying to return a procedure accessing a local scope is an error, instead to pass closures around, you need to explicitly capture variables from the surrounding scope(s) via compiler directive macros

On its own, a procedure is simply a block of code: {}, the rest is achieved via macros, for example:

  • lambda(x: Int) { return x+1}
  • do { a = getValue(); print(a); }
  • for(i in range(1, 10)) { print(i); }
  1. Functions:

A non-anonymous identifier for a group of procedures with the same name and parameters, they represent ad-hoc polymorphism or in other words, overloading. For example:

  • The + function can have separate procedures for +(Int, Int) and +(Int, Float)
  • A hash(value) function might have different implementations for different types of value

Assertions can impose constraints on accepted values, and will produce compile-time errors when possible, and otherwise error out at runtime.

Value-based dispatching is contemplated, e.g. being able to define behavior for not just different types, but also specific values or ranges of values, such as hash(s: String) { definedFor(s:isUpper()); return hash(s:lower()) }.

The behavior would be enabled when using binary modules as well, forcing calls to be dispatched between the local and remote version(s)

  1. Structures:

A layout of named fields of specific types, essentially translates to a memory layout.

Will presumably be implemented as a macro that works on {} blocks, to allow reusing an existing mechanics of scope lookup, but with restrictions applied and modified semantics.

  1. Typed unions:

Composition of a type tag and alternatively typed entries. Size is equivalent to the size of the biggest option.

Unions can be switched on, and assigned values of types they account for, and otherwise are treated like other types.

Examples:

  • Built-in Option<T> can be either Some(val: T), or None
  • value: Option = 23 will infer Option<Int>, and assign Option<Int>.Some(23) to it
  1. Traits:

Similar to interfaces, but can provide implementations of functions, including those it requires. Can include further constraints on types it can apply to, including "inheriting"/requiring other traits

All traits implicitly require destructors, if such exist for a given type.

The implementation is equivalent to:

struct MyTrait {
    vtable: &MyTrait_vtable;
    object: &Void;
}

Traits can be empty, and can be marked as implicit, e.g. implemented for any type for which required functions are implemented.

  • For example, with a hypothetical ImplicitlyHashable, requring hash(T): Int, a printHash(v: ImplicitlyHashable) { print(hash(v):hexDigest()) } function will be usable with any type T for which hash(T): Int exists.
  • Empty traits can be used as tags
  • An implicit empty trait Any can be used to pass around values, and dynamically typecheck or cast them. Note that a Dynamic type exists which wraps this into a more convenient package.
  1. Classes and OOP:

They are not a built-in feature, and instead an OOP model is provided by the standard library and implemented using macros.

Functionality includes:

  • Multiple Inheritance, including fields, vtables,
  • User-defined vtables, containing lists of functions to be dynamically-dispatched
  • Automatic generation of:
    • Cast function implementations for casting between classes in hierarchies.
    • Functions that dynamically dispatch analogous vtable entries
    • Specialization of base pointer types to handle (potentially virtual) destructors
    • Classification functions, e.g. isPOD, inherits, etc.

Aim is to be roughly equivalent to C++ classes.

  1. Generics:

Generics are compile-time procedures that take one or more parameters, and produce types(trait/struct/union, by extension classes), functions, or procedures.

Generic arguments are inferrable, e.g.:

  • str = "hello world!" will be inferred to String
  • ids = ["John": 42] will be inferred to Map<String, Int>
  • arr = [1, 2, 3] will be inferred to Array<Int>, or Array<Int, 3>, depending on its usage. For example, if arr:append(4), or other procedure changing its size is called, arr will be inferred to be a dynamically-sized array
  • arr = [] will be inferred to Array<Unknown>, and it will remain partially-typed until the full type can be inferred, e.g. via arr:append("hi!"). Usage of partially-typed generics is limited, as they are not complete types.

Generics can include macros, which will be executed after the types are applied, for example:

  • A macro inside a generic could construct N fields based on an N: Int parameter
  • Or define the type signature of a printf<Format: String> procedure
  • Or do anything else that macros are capable of

Partial and full generic specialization is allowed.

It's possible to disable type inference for a specific generic.

  1. Macros and related functionality:

Macros are functions executed at compile time, capable of manipulating the parser, compiler, and other concepts that make no sense at runtime.

9.1. Usable outside macros

Data types and names that help leverage the power that macro functions offer exist, and some of them are available at runtime and for non-macro usage as well.

  • ExprOf<T> can be used as a parameter type to facilitate lazy evaluation, and will be passed as a procedure returning T. As it can contain references to the context in which it is defined, limits might be imposed on what can be done with them in non-macro functions. To avoid that, closures must be used.
    • For example, log(ExprOf<String>) and log("[%s] %i: %s" % (system, code, name)) can be used to only perform the costly string formatting operation if logging is currently enabled.
  • Identifier can be used to prevent identifier resolution, e.g.: f(id: Identifier) { return id:getName() } Arguably a feature not very useful outside macros.
  • Reflection can be used at runtime to facilitate metaprogramming, but when used at compile-time will instead be optimized into fast, static code, as well as be allowed to do things impossible at runtime(e.g. add fields to types)

9.2. Strictly-macro functionality:

There are two main things that macros can do that is impossible at runtime, access core macros, and the compile-time context objects used for symbol(and operator/custom literal) resolution

9.2.1. Core macros:

Core macros are macro functions that directly map to the final AST, or otherwise interact with the process of parsing and compilation.

Core macros include:

  • Core language constructs such as loop(), branch(), call(), return(), ...
  • Parsing-related macros such as defineInfixOp, defineOpAssociativity, and other (for-now) secret sauce of my parser ;)

9.2.2. Context objects:

Context objects are objects that the parser/compiler uses to resolve symbols, operators, custom literals(marked with backticks), etc.

Can be leveraged to allow:

  • Affect symbol resolution, e.g. by providing an alternative symbol lookup function that makes all symbols default to 0

  • Inject new expressions into scope, either at call-point, beginning, end, or arbitrarily chosen point.

  • As scopes are all actually functions, macros let you execute them, e.g do { print "hello" } is a macro that executes the anonymous function contained inside {}

  • That involves "joining" scopes, e.g. mapping the inner scope's return function to the surrounding scope's one. Without it, do { return 4 } would only return into ether, in other words, the value would be discarded.

  • Execute expressions as if written in callee's scope, e.g.:

     macro incrementA(call_scope :Scope) {
         inScope(call_scope) { a += 1 } 
     }
     a = 3;
     incrementA(@); print a // prints 4.
    

    The @ symbol specifies the current scope, and is by default passed to all macro functions, it is only marked here for demonstration purposes.

Most of the language is implemented using a combination of core macros and context manipulation, this includes for-in, do and while blocks, if-else, pattern matching, ...

NOTE: Macros can be "plugged" into the generics system as type factories, and be otherwise treated like any other generic, except with disabled inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment