paniq/plan.md Secret

## plan.md

      
    Raw
  

              plan.md
            
          
    NOWHERE Devlog

Thanks to all patrons who are supporting us so we can keep the Duangle ship
flying.
You are no patron yet but would like to support our work? Please visit our
Patreon and check out your options! Every
little contribution that moves us closer to our monthly goal helps.
2.2.2024

Wow, three years later! I have since decended into a deeper and deeper rabbit hole of compiler development to meet the technical requirements of the game, a yak shaving procedure that felt like I was going from strides to inches to picometers, all the while the game itself was patiently waiting for me to get on with it.
After conversing with a fan on reddit I decided I had to put a stop to this and claw my way back out of development hell towards the purgatory of progress.
Feeling I had to rubberduck my design & direction problems with someone competent, I cobbled together a Shigeru Myamoto LLM for myself and had a surprisingly productive conversation with its game designer emulation. There is now a very specific sense of what core system to focus on initially, which has never been done before, but fortunately happens not to be too hard to do with the skillset that I have built in the past years, so I'm excited about it. I am going to use the opportunity to learn Godot a bit by building a prototype for that system in it, which should also help me in deciding if we should build further upon Godot. The prototype will be very ugly, but it will answer my most pressing questions regarding the intelligence of our NPCs and many other procedurally driven mechanics, and if it all works out, we can later develop the rest of the game mechanics around that, which are rather conservative in comparison.
The immediate step after the core system works is to start thinking about presentation again, and build tooling for procedural asset design; I don't expect Godot to be very helpful in that area, but I'm open for a positive surprise. Once the results no longer look like complete ass (like, 75% ass-looking is fine), we can finally publish the first alpha that demos the actual core loop of the game rather than just the audiovisual vibe of it.
26.3.2021

Spring is nigh, COVID-19 is high! While the world is still trying to get the pandemic under control, my mood has improved (after it got even worse in December and January, thanks, Seasonal Affective Disorder), and I try not to think too much of the amount of work our little project has cost me so far and will still cost me. One step at a time.
I am continuing to work on Tukan's compiler, whose intermediate representation is shaping up to become pretty good. As you know, Tukan programs are what our entire game will run on, from procedural asset generation to game logic, NPC logic and physics simulation. They won't be edited in text, but assembled and maintained within a graphical user interface.
Heavily inspired by Functional reactive programming, a Tukan program is described as a pure functional graph that synchronizes asynchroneously received input with asynchroneously sent output, that is, it acts as a switchboard for System and Memory I/O that we can take stateful snapshots of at any point, as if we were running it in a VM.
What makes programming Tukan a particular treat is that it supports push logic, i.e. source nodes in the tree can control whether sink nodes are active, which allows event based programming. Because the program is pure, we can freely reorder instructions to achieve optimal parallelization and performance, and I discovered a wonderful logical principle, based on Binary decision diagrams, that allows to optimize all conditional logic down to a reasonable minimum, which means programmers won't have to think about grouping conditional code for efficiency, because the compiler does it for them; instead, they can organize code semantically, in a way that benefits maintenance, decoupling and understanding.
In addition, I discovered a way to optimally store any DAG (short for Directed Acyclic Graph) in a way that preserves topological order and supports content-agnostic garbage collection, deduplication/compression and serialization. Tukan programs will use this representation in-memory and on disk (possibly encapsulated in a LMDB database for process synchronization), and will also be able to manipulate it, as well as generate and run new programs. I don't have a name for this representation yet.
For a more in-depth, technical documentation of my progress, I've started leading a gist of ad-hoc devnotes in January, which allows me to reflect on my thoughts and thus enjoy the benefits of a rubberducking process. If you're interested in the particulars, have a look there.
27.11.2020

This is probably the longest time I went without a devlog update, as this year has been an incredible distraction in itself, both outside and inside our personal lives. It's my personal ambition to only post a new entry when I have positive news, and while there were some gleams of hope, and there were good things to talk about, somehow I still put it off. I spent too much time on social media trying to keep together a center that cannot hold, and in the end, it didn't even matter. There were so many worries about whether I was wasting my time with "childish things", whether it was worth it to go on with our creative project in the face of the challenges that our world is facing, and it took a further slip n slide into depression culminating in a full-on mental breakdown on the dancefloor of life to, as a person with a little more grit would say, get a grip, pull myself together and decide that, yes, I DO have answers, I DO have gifts; that, once the logical-creative bridge that Tukan represents is operational, anyone who likes and dares may explore its rich generative space and simple freedom of expression, and Sylvia and I will no longer have to be the only travellers on the road to NowHere.
That's the dream as it's wanin' and waxin'. Now for the facts.
All Duangle projects have found a new home at sourcehut, a plucky project hosting site that still supports Mercurial repositories. I'm very happy with the service so far and we're also paying for a subscription.
Scopes 0.16 has since been released and enjoys many improvements thanks to the help of our contributors.
In my last blog post, I described a visual programming language named Conspire, whose goals have now been merged into the Tukan project. We also applied for a grant at the prototype fund, but are still waiting for the results.
The Tukan data model and LMDB serialization layer have been designed and implemented, and a p2p protocol will be added later. As it stands, users can describe programs as pure functional directed acyclic graphs that unify the concept of pipeline processing, object document model and functional programming, with the help of a homoiconic data structure that only requires seven types: none, bool, number, text, blob, symbol and cell.
I am now working on the specializer which translates Tukan functions to compiled code. A program built in Tukan can be understood as the kernel of a message loop that receives the system state as it presents itself in this moment of time (state, hardware inputs, file system), and returns the next system state (state, framebuffer, file system). Parallels can be drawn to ShaderToy, in which processing is also pure and entirely self-contained, or Pure Data, with the exception that we also allow snapshotting execution state, reversing time, branching data, and sharing programs with state over the network. The goal here is to afford artists the same luxurious project management tools as every programmer enjoys: crash safety, efficient use of resources, source control, live introspection, and full control over the inner workings of every function employed.
We want to ease the cognitive load of programming as much as possible, without giving up any of its professional features.
The challenge is making it efficient and fast. Tukan is supposed to pick GPU paths where they can help accellerate processing loops, and to transparently take care of all the pesky vertex, texture and framebuffer resource management that a graphics programmer needs to bother with these days. I lack some algorithmic insight in this area, which manifest as roadblocks, but am steadily learning how to tame the beast. Sylvia is helping me focus, and her trouble with the tools she has to use is additional incentive to Hurry The Hell Up And Save The Computer Workflow From Infinite Complications.
And this is where we are. In case I'm not making another post this year, I wish you happy holidays and a happy new year, wherever you are, and let us pray that, despite all the suffering and the injustice that's still taking place, with the power of art, science, medicine and engineering combined we'll be on the mend, so that we can all find the courage again to move on from coping to hoping.
Cake is eternal.
6.11.2019

This summer hasn't been very productive. Because of an illness in the family
that I would like to keep private, we have had to spend a lot of time organizing
things that have nothing to do with game development. In addition I'm still
defending myself in a lawsuit that keeps sticking like shit on a heel. This, our
waning funds, and our incapability to market ourselves successfully have
exhausted me spiritually, and it has once again become difficult to be hopeful
and optimistic in the face of ever-growing obstacles.
But I'm keeping on trucking, even if that truck is stuck in the mud half of the
time. The only way out is through, right? Hashtag adulting.
Bitbucket has announced that they're ending Mercurial
support, which I can only respond with in unkind words, as I have dozens and
dozens of repositories on Bitbucket depending on it. Frankly I don't understand
the decision. The distinguishing features of Bitbucket are fading. With every
passing day, they become more like Github, without ever becoming Github, so
what is the point? Anyway, we will have to move everything before time runs out,
and I don't know yet where. I would rather shoot myself than use git.
We haven't done the version 0.16 release of Scopes yet
because there are still so many bugs to fix and stuff to polish, and I want it
to be a good release that lasts for a while. Besides all present users work
from the trunk anyway, which has reduced the pressure to release.
By the way, if you are dabbling with Scopes and I don't know about you, please
let me know. I only count four users so far, myself included, but I suspect
there are more out there who never bothered to establish contact. I want to hear
from you! We are all hanging out on Freenode IRC,
#scopes if you'd like to drop by, but I'm also highly active on
Twitter.
Tukan, which is supposed to become Nowhere's game engine,
has grown quite a bit, but the absence of additional infrastructure pains me.
We are in dire need of a package manager that can help register and deploy
libraries and dependencies. Ideally this package manager would be able to pull
from Mercurial, GIT and Subversion repositories, as well as unpack tarballs and
zipfiles, not to mention building and moving files. It should also support
multiple package sources, deploy into user directories and be cross-platform so
people can use it for their personal projects. Oh, and it should be really lean,
like, one executable that does everything. The
Premake / Genie
of dependency management.
I have begun to think in earnest about the visual programming language I have
mentioned in the last post. The goal is to build a new Conspire, as described
in this blog post
from 2015 (scroll down a bit). It is a structured editor-editor that takes its
user input & control cues from text editors, but also allows the user to
piecewise elevate the simple structure of the editor to become a complex IDE
for any imaginable content, and in reverse, take apart that IDE to understand
how it is built and turn parts of it into something else.
Originally I wanted the file format to be text based (human readable), so we can
reuse existing tooling, namely diff, patch and source control, and users can
also make changes by hand. But text invites issues with parsing and loose
coupling, diffs can merge in unfortunate ways, our data is more graph-ish than
linear, and symbols have no human readable names which renders editing by hand
unwieldy. It is likely that users would be frequently forced to regress to
text editing to fix issues, which ruins the experience.
I'm therefore considering using LMDB,
a library implementing a fast transactional key-value store to host modules as
database files in binary format. Here's where I see some advantages for users:

Definitions have no ordering in the file that dictates ordering in the
presentation, as with text files. We can arrange relevant definitions
according to the editing context.
Binary blobs can be embedded and streamed, which means modules can
host media like images, fonts, videos, textures, etc, as well as keep cache
in-file.
The file can be monolithic, containing the entire app image, but does not
have to.
The database can version its resources on a per-item level, not requiring
additional source control. Undo buffers can persist independent of the
application running.
Multiple users can edit the same file simultaneously over the network.
The file format can be updated transparently without the user needing to
bother.

These ideas overlap somewhat with the virtual filesystem of Smalltalk,
as well as the DNA system from Blender.
There are pitfalls to this mode of working that I don't want us to fall into.
These are the challenges most obvious to me:

Missing tooling. We need answers to all common operations that users do:
copy/pasting, modularization, refactoring, rollbacks, branching,
linking external dependencies, comparing versions, using pastebins, hosting
source code, deploying source releases, publishing binaries. Do not be fooled
by this short paragraph, this is a huge field that must be tackled
holistically in order to keep the solution complexity minimal.
Maintaining loose coupling between separate modules, each file typically
managed by someone else. I'm gravitating towards modules importing
and syncing copies of dependencies so that they can profit from updates, but
also keep functioning in case of the sync failing. Loose coupling can also
produce dangling references that need to be resolved by the user, but must
not disrupt other functionality in the same module as long as they exist.
Garbage. it is relatively easy to manually scan a text file and see things
that are no longer required. As we don't have that with a database, we need to
be able to find unreachable resources and point them out to the user, who can
then archive or bulk delete them.

As with Scopes, I once again want to follow the tenet that the program should
be self-hosting. The binary providing the infrastructure should
only supply the smallest workable implementation required to allow bootstrapping
a comfortable working environment. This allows users to almost completely
dismantle and replace functionality as it serves their purposes.
The target audience of Conspire is artists with a strong programming
background. I want to make it easy to ad-hoc build small applications from parts
without introducing barriers to expressivity. Sentiments you can often hear
when programmers have to build logic in a GUI is "I could have typed this down
in half the time" or "if this were source code, we wouldn't be stuck now".
This preconception of visual programming is what the design must overcome.
31.5.2019

Scopes is functional again and open for business. We
released version 0.15 a few weeks back and 0.16 is coming soon. At this point
updates are mostly fixes and small extensions, although I have plans for new
generic features in the future. I'm trying to raise more interest in the
language. At the moment I know of two active users, and I hope over time it
will be more, and we'll have commercial users as well. I'm continuously
expanding on documentation when I have time. The chapter I'm currently working
on is Scopes for C/C++ Users,
although I suspect that we will have greater success with script language users.
I'm back to working on our game engine for procedural games,
Tukan, which Nowhere is running on. The next big step
is writing the procedural asset editor, particularly the visual programming
part, which allows us to quickly scaffold editing contexts without having to
invent new GUI's all the time. From within Tukan, we build and test rules for
generative music, sound effects, 3D models, visual FX, NPC AI etc.
Like Scopes, Tukan is open source, so it's free and available to everyone. At
the moment it is very much a work in progress and largely broken, so I don't
recommend trying it.
We are two months away from running out of funds again. I'm going to integrate
Scopes and Tukan into our Patreon, and
advertise for our project more heavily. Perhaps the best way to do that is a
new video?
8.3.2019

Sweet mercy, I still can't believe it. Yesterday I fixed the last remaining test
for the rewrite of Scopes, and with this, the language
is functional again, after 10 months of being stuck in rewrite hell.
The next step is writing tests for the new borrow checker. Yes, Scopes has
a borrow checker now! I
figured it out
after all, somewhere in November, and wrote a new pass for it.
The borrow checker works a little different from the one in Rust (see link
above for details). First of all it works not by annotation but by inference,
tracking how variables move through a function and altering the function
signature accordingly. It doesn't only apply to pointers, but also to handles.
You could say that pointers are just another kind of handle. Scopes' borrow
inference currently doesn't discern mutable values and put up barriers for them,
this will be added in a future version.
Variables are properly lifetime checked, meaning their lifetime ends with their
last point of use, not at the end of their lexical scope. This is a feature that
is currently being worked into Rust, but Scopes has it already.
When the tests are written, I will fix the game engine and then finally do the
Scopes 0.14 release. Then we're back on track to the next NOWHERE alpha release.
23.9.2018

Every time I think I'm out, they drag me back in. Yes, I'm once again occupied
with rewriting a part of the compiler, but first: more info on the voxel
renderer.
I got hierarchical raytracing working, as well as physically based shading.
Here is a gif animation
of what it looks like in action, demonstrated on a test model. Since the data
is hierarchical, LOD support is also working.
The next step would be improving the SDF to voxel converter, as well as
implementing a GPU-side bounding volume hierarchy to accellerate rendering
many objects sharing a single scene, but the absence of basic memory management
in scopes made itself painfully clear, and compile times were shooting up,
so I had to take care of that first.
Unfortunately the solution to both problems was to completely rewrite the
intermediate language and update associated systems.
A user reported that his simple project, which made heavy use of generics and
compile time folding, had its start up time shoot up to 28 seconds. It became
evident that heavy reliance on constant propagation and constant expression
folding had been a bad call. We were practically running into a magnified
version of the problem that C++ already has, with its usage of templates and
constexprs slowing down compilation, which resulted in veteran gamedevs trying to
avoid those features where they could.
Why are constant expressions slowing down the compiler? It's simple: having to
solve expressions at compile time, the compiler has to resort to a mode of
interpretation. So compile time expressions aren't being evaluated at
machine code speeds.
Scopes made the problem worse by introducing constant loops which too fold at
compile time, and have to be processed by an interpreter. This was what
ultimately drove up compile times. The whole approach doesn't scale.
It is theoretically possible to put a JIT in the compiler whose only job it is
to fold expressions, but even the JIT can't abstract away the fact that
intermediate language datastructures have to be continuously updated.
How did I fix the problem? First, I disabled constant propagation & expression
folding completely: templates would now only be instantiated for argument types,
branches would not be folded, loops would not be unrolled, and any operation
on constants would now generate code. That also removed the need for unconst,
as well as made it easier to predict what code would actually be generated.
Then, to reintroduce controlled constant propagation at function boundaries,
I added the concept of "inlines", which are functions that are guaranteed to
be inlined into the calling context. Since all arguments are inlined into the
function body, that would effectively constitute constant propagation. Scopes
did implicitly inline functions before to resolve compile time closures, but
now the process is nicely explicit, and users can reason easier about what
the compiler will do.
Lastly, one question remained: now that the compiler no longer folds constant
expressions, how are we going to get that feature back where we actually
need it, and how can we use it in a way that is faster than interpreting
constant expressions atomically?
The answer was Label macros, which quickly got renamed to AST macros, and I'll
tell you more about them after we talked about the next issue, and how it
was solved:
I wanted C++ style RAII support that would run destructors of variables going
out of scope, but the CFF representation, which I used up to then, made it
costly to analyze scoping. I figured that an AST, which I didn't have until
that point, would retain the information that is already user-authored through
the usage of nested scoped constructions, so I moved away from labels towards
a tree of operators and control structures. This also made it easier to
generate SPIR-V bytecode, which requires scope annotations for branches and
loops.
I also had to throw away the Any type, as it became rendundant, as well as
the Syntax type, as anchors are now directly retained by values, which made
syntax macro related code generally easier to read.
The whole code processing pipeline now roughly works like this: the parser takes
a byte stream and outputs an S-expression tree. The syntax expander takes
the S-expression tree and generates an untyped AST (which is more of an
Abstract Semantic Tree). The prover (formerly called the specializer), which
I had to rewrite completely, takes the untyped AST and outputs a typed SSAT
(Single Static Assignment Tree), where all expression trees have been
flattened and only control structures are nested. The code generator takes the
typed SSAT and generates LLVM IR or SPIR-V, which are then translated to
machine code by either LLVM or the graphics driver.
So, with the label graph now being an AST, our Label macros were now AST macros.
How do they work? You call an AST macro like any other function, but it is
invoked by the prover at compile time, receiving the arguments as typed AST
nodes. The AST macro is a compiled function generated at an earlier compile
stage, so it runs with the same performance profile as the compiler itself. You
could say that it operates very much like a compiler plugin.
The idea is not new, but a staple of multistage programming. I discovered its
first implementation in the Terra project, and truth be told, one of the
earliest prototypes of Scopes had such an implementation, but I opted for
constant expression folding instead, which was, I admit it, a huge mistake.
An AST macro can analyze the types of incoming arguments as well as extract
possible constant arguments, and then return a new piece of untyped AST that
the prover then types and expands to SSAT form. If that form contains more
AST macro calls, these get expanded again, and so on.
AST macros can abbreviate a lot of constexpr boilerplate into a single function.
Need to go over every argument, extract keyed attributes and construct a
new type? For constexprs, each step would be interpreted individually. But
AST macros use machine registers and API calls for this. The overhead is
much smaller, and compile time goes down.
So, the AST and AST macros were the two big changes. Because of AST macros, I
also had to rewrite the core bootstrap file, and I'm still busy finishing that
up. The REPL console started working again yesterday, but none of the tests are
working yet.
As soon as the core rewrite is complete and the tests are working again, I'll
release version 0.14 of Scopes. If you are using Scopes, be prepared to make
changes to your codebase. Your code should end up becoming smaller rather than
bigger.
Once version 0.14 of Scopes is released, I'm continuing work on the engine and
the raytracer, along with adding the RAII mechanism for Scopes 0.15.
Addendum: Oh did I mention, Scopes 0.14 will also have proper support for
exceptions. There are now builtin constructs for try/except and raise,
and they are backwards compatible to C.
How are they implemented? Each function can have two return paths, a regular
one, entered with return, and the exception path, entered with raise.
On the C level, a function that can return an exception returns a struct
with three hidden members: a boolean, an except value and a return value.
The boolean indicates the index of which of the two value fields is valid. If
it's 0, the exception field is valid, otherwise the return value is valid.
When such a function is called, the calling function implicitly generates
a conditional check and either resumes execution with the return value or
enters the currently active catch block with the exception value.
To allow type inference for exception types, a function can only throw a single
exception type, and likewise a try block can only catch a single exception
type. If two different exception types are thrown in the same block, a
compile time error is generated.
How do we get around the limitation of single exception types? The Scopes core
will implement support for sum types, which makes sum exception types possible
as well. This way, the basic building blocks for exceptions can be leveraged
to raise different exception types within one sum type, and to dispatch
for different exception types in the catch section of try blocks.
All in all, this brings us up to 1:1 feature parity with Rust's Result type,
while providing basic convenience for enforced and typesafe error handling.
18.5.2018

Four months later! What can I say. I don't like to write devlogs when I
have nothing interesting to report, and the first quarter of 2018 was
indeed rather sluggish and eventless, with me mostly working on our
programming language
Scopes and our game
engine Tukan -- oh did I mention our game engine is named
Tukan now, and it has a
fancy logo too.
I was brooding over the fact that I had to regress to voxel raytracing,
which would mean that I wouldn't get every feature that I wanted, but my
mood has considerably lightened over the fact that the list of things
that I could get with sparse voxel octrees grew longer and longer,
and it might even be possible to get rid of the jaggy voxel look by
using good magnification filters for the surface and/or raytracing
operators directly. There are many options.
I just reached an important milestone in development. The language is
mature enough to write complex programs and shaders with it. I'm still
not happy with memory management and declaring mutable values, but it
works, and it has to do for now. I invested a bit of time in borrow
checking and still haven't found a way to do it that doesn't hurt. The
engine is also coming along nicely and steadily growing in tool
functions and programming sugar.
But that's not the important milestone. The important milestone is that
the sparse voxel octree raytracer works! (click for
gif) Considering
that the game will render all models this way, that's a big deal. I'm
currently using a variation of Perfect Spatial
Hashing to store models of up to
1024^3 voxel resolution in two one-dimensional texture buffers
currently about 3.2% the size of the full volume. A query only requires
two sample fetches, which speeds up raytracing considerably. Cache
coherence of the buffers isn't great tho, which is why I want to try
Coherent Parallel
Hashing
later on (thanks to mmalex for pointing me to it).
Next up is clean up, optimizations, then cone tracing to improve
performance and fidelity further, as well as first primitive attempts at
realtime GI. The model is currently generated from a fixed distance
field, which is supposed to become a data structure (list of operators),
and finally, models are supposed to be generated on the GPU so the
preprocessing step when the game first loads doesn't take forever.
As a final note, I should mention that while it took almost two years
(!) to develop the new language, the work is paying off. I prototyped
the SVO raytracer in two days from scratch, yielding excellent
performance from the start while making coding nearly as easy as
whipping up something in Python (unlike Python, Scopes forces you to
think a little more about data structures, but not to an obnoxious
degree). I still have to halt the Scopes-car every few kilometers and
pop open the hood to make a correction, but it's coming together. I
don't need C, C++, Python or Lua! Coding is not awful anymore!
I just noticed that the entire lifetime of Scopes, formerly Bangra,
formerly Bang, is documented in the devlog! The inception begins
18.7.2016
29.1.2018

Happy Halloween! Happy Thanksgiving! Happy Holydays! Happy New Year!
For the past three months I've worked on something called Frustum
Tracing, which was an attempt to adaptively render CSG products of
implicit surfaces in viewspace. I had severe problems with figuring out
many details of the implementation, so I wrote and published a 2D
version first, called
RTCSG2D (the 3D
project was just called RTCSG), from which I learned quite a bunch of
things, but it wasn't enough. In order to efficiently render a series of
subtractions for a tile in viewspace, it is necessary to sort all
surfaces first in ray direction, which has a complexity of at least O(n
log n), for each pass of a process that is already O(n log n), and that
killed the idea. It's still possible to solve the problem in worldspace
by raytracing a sparse voxel
octree
of CSG products, but the idea is still too new and I decided I have sunk
enough time into this.
Over the holidays and in the weeks after I've sunk into a bit of a
depression - not the medical kind, more the one you get when bad things
happen to you, you don't see a way out and thus you're paralyzed. The
new renderer didn't work out. The project is taking forever. I couldn't
tell what comes next. I felt like a failure.
I worked a bit on the game design document, trying to figure out what
communication of players with Nowherians would work like (an open
problem since project inception), and didn't progress much there either,
save for an interesting list of feature requirements for it. Intuitively
I feel that the solution will be non-systemic: look at the individual
needs for communication, devise a unique and fitting solution for each
one, and then maybe some common attribute will present itself that
could be turned into a system. But right now, it's impossible to say
what that would look like.
And then Staxel got released, I farmed the
hell out of it and am generally in love with the style. I toyed with the
idea of doing a small parody farming game as a one-off, where you would
start with a perfectly running farm, and then slowly sell everything off
in order to build a cryptomining operation. I would only be able to do
it if most of the code were already written and asset creation wasn't
too difficult, so I figured, maybe UE4 + some voxel engine. I went
through the UE4 marketplace and found nothing that would permit a
Staxel-ish look or workflow. There are a lot of Minecraft engines, but
nothing at a 1/16 voxel granularity. In desperation, I even checked the
Unity marketplace, and nothing presented itself. Bummer.
I remembered
Voxelnauts,
which is a game that sadly never got released, but has a similar style
as Staxel, and peculiar enough, is rendered by raytracing a sparse
voxel octree. Hmm, I
thought, a sparse voxel octree? I would have had to write one anyway.
And they're really useful. You can do all nice sorts of rendering
effects with them, thanks to a technique called voxel cone
tracing.
Maybe, I reasoned, if I used the same technique for the main game, I
could rationalize putting time into this. Yes, voxel graphics aren't as
smooth and perfect as implicit surfaces. But they can easily be
procedurally generated from the same functions. They can also be
hand-authored with apps like MagicaVoxel,
which is great for one-offs and auxilliary assets! They're well
documented, and with a modern technique like
VXGI, they can support
physically-based shading and look beautiful. And later on, I could
possibly continue building on SVO to get the implicit surface CSG
representation that I originally wanted, for a Nowhere 2.0, perhaps.
So then I had a bit of a good
morning moment and then
went back to work. I'm still reading papers on the subject of
implementing SVO's, and I'm writing shadertoys again to understand some
of the details better. I expect to progress much faster this time. Let's
see how it goes.
16.9.2017

Let's get right to it. Last post, Bangra was still called Bangra, but
now it's called Scopes! And it
has a logo! I changed the name because I wanted something that fits in a
single syllable, and a name that better reflects what the compiler is
about. Scopes are a datatype within the language, and it's very much
about domain specific solutions, depending on the scope of projects,
so this is what I settled with. It's meant as sort of a meta-name;
something that you would implement, for example, a markup language with,
and then say something like "YAMAHAML (Yet Another Mildly Amazing
Hypertext Annotation Markup Language) is a library for Scopes."
Where are we on the Roadmap?
1. The few Scopes tests that I have need to be fixed to match the
modified parsing rules and the reduced type system.
All tests run (in fact I added quite a bunch of new ones) and the
bootstrap part is pretty complete.
3. A lifetime tracker needs to be written (in Scopes). (...)
I'm unfortunately too stupid to think of a good approach. Every time I
try to think about it I realize that I know nothing. I try to google
something about it but nothing useful comes up. To be honest at this
point I'm hoping that someone's going to look at the IL and figure it
out for me. The problem is that when a value goes out of scope, a
destructor must be called, which means new labels must be injected into
the existing code, and those labels need to be specialized. But we can
only do the lifetime analysis after the labels have been specialized,
and destructor functions also need to be specialized and then tracked. I
don't look forward to any of this because it's going to make the
pipeline ugly, but I also want that feature, but I want it in a clean
form that fits the existing principles.
5. Analog to 4., a SPIR-V generator needs to be written that translates
specialized IL into SPIR-V. (...) Now, with the power of Vulkan, we can
magically run Scopes functions on the GPU as well as the CPU, so shader
functions and gameplay functions can source the same math library.
It's all done and embedded in Scopes directly. It turns out that
implementing a Vulkan loader wasn't needed at all (by god I tried, but
the bureaucracy is insufferable). There exists a wonderful small library
called SPIRV-Cross which
translates SPIR-V binary back to GLSL. I embedded it directly into
Scopes so code can be transpiled to GLSL on-the-fly.
The SPIR-V support has forced me to make many shader features
intrinsically supported on the CPU as well; so functions like sin, cos,
log2, normalize, length are now builtins; and the arithmetic operators
all support vectors. This allows me to run shader code on the CPU and to
write unit tests for it. It's delicious. Also, compile time closures and
type inference are just as well supported for shaders as for native.
I'm now working on porting Liminal, the game engine for NOWHERE, over
from None to Scopes. Many essential modules have been already ported,
and there's support for SDL 2, ImGui, GL 4.5 and GLM-style vector and
matrix types (not completely done, but working). Thanks to Scopes'
GLSL-backend I can target CPU and GPU with a single language, so there
are no more duplicate code bases for interval arithmetic, automatic
derivatives, perspective projection, HSL/YCoCg color conversion, versor
rotation, and so on. It's what I always wanted.
The biggest part of Liminal is the implicit volume renderer that I
originally just wanted to port from the original but then realized that
most of it is stuff that I will throw away anyway for the big
refactoring to support realtime CSG. In preparation of that I dug out my
one and a half year
old
notes and
source
code
to illuminate some of these tricky detail corner cases that the devil
proverbially likes to hide in.
The first part was figuring out how to do better frustum testing for
primitives. I couldn't think of a good non-iterative analytical test for
the sdSuperprim, so I decided
I'd try it for the basic set of Box, Ellipsoid, Cone, Cylinder instead,
and then compose scenes from those. I wasn't sure it would work at all,
so I first wrote a implementation
guide
for Ellipsoids. Then I carefully approached the problem by doing
shadertoy prototypes, first the 2D case for
spheres, then
ellipses. This looked all very
promising, so I implemented the sphere 3D
case next, then
boxes. Doing cones and
cylinders is too hard for me right now and non-obvious so I'll put that
off until later.
I also had to verify that my CSG evaluation idea would actually work so
I implemented a prototype for that one too, the Interval CSG Register
Machine. There's a summary in
the header that explains how this works in principle.
All this done, it is time to actually implement all this in the game
engine, and this is where we are now.
16.7.2017

Oh my god. Four months later, and there's no end in si-- just kidding. I
have had a breakthrough a few weeks ago when I did one more rewrite of
Bangra in C++ and this time
skipped the interpreter completely and went straight for type solving &
compilation in the first step. It works! The worst is over, the
result is fun to use, and most of the work is in userland now. Bangra
even got its REPL console back :-)
Along with a working compiler, I have also written a Visual Studio Code
extension for Bangra and a pygments script for Bangra's documentation. I
would like to submit the pygments tokenizer to Bitbucket and Github at
some point so my code previews and gists look okay, but right now I
don't think they're going to accept it.
Let's revise the roadmap I laid out in the last post:
1. The few Bangra tests that I have need to be fixed to match the
modified parsing rules and the reduced type system. The FFI needs to be
ported before step 4, along with the clang parser (Both still exists
only as C++ implementation).
The clang bridge has been ported from the old C++ version to the new
one. The FFI got a rewrite and is part of the solver now, allowing to
fold constant expressions involving "pure" C functions. The bootstrap
part isn't complete and the language has changed, so most of the tests
don't work (including the test runner itself).
2. A partial evaluator / specializer needs to be written (in Bangra)
before Bangra functions can be trivially compiled. (...)
This is all done, but written in C++, not Bangra - which is fine.
normalize() solves all argument and return types in a graph of labels,
inlines constants, folds expressions and lowers closures. The result is
passed to a trivial IL->IR transpiler.
3. A lifetime tracker needs to be written (in Bangra). It annotates the
IL with information on what values are required in each continuation,
how they should be allocated and when they are dead. (...)
None of this exists. Bangra doesn't have any memory management
whatsoever, and I'm not sure if I want to change it. I want to do
lifetime tracking but I don't yet understand well how it would work,
what the syntax would look like in a type inferred language (as opposed
to Rust), and if it couldn't just be implemented on the user side. I
also want to add RAII support; The simplest version I can think of is to
implement a defer keyword that allows to queue functions to be
executed when the scope is exited. As always, the design goal is to
provide the fewest building blocks necessary to give users the power to
express whatever idea they can think of.
4. A LLVM IR generator needs to be written (...) that takes specialized
IL and turns it into LLVM IR (using the LLVM C library). (...)
All done (this is the IL->IR transpiler), including basic debug
symbols to support GDB and GDB frontends.
5. Analog to 4., a SPIR-V generator needs to be written that translates
specialized IL into SPIR-V. The format is very similar to LLVM IR, and I
know at least one helper library designed to assist with generating
SPIR-V. Now, with the power of Vulkan, we can magically run Bangra
functions on the GPU as well as the CPU, so shader functions and
gameplay functions can source the same math library.
Is not in the scope of Bangra directly, but will be part of the engine,
and written in Bangra. I can't really start this work until the Vulkan
framework is set up. This is where the actual game engine work
continues.
Inbetween 6. to 8., there's going to be a problem of a rapidly growing
codebase that takes longer and longer to load. The interpreter already
takes 1.5 seconds to bootstrap Bangra.
...and at times, it got even worse. But the problem is gone now, because
Bangra compiles all code directly. Bangra enters the REPL within 400ms
in debug and 150ms in release on my machine. I would say anything up to
one second is acceptable.
10. The interpreter needs to be ported from Lua to Bangra so Bangra can
run and most importantly specialize itself. (...)
That is all over. Points 10 to 13 are off the agenda. I'm not going to
rewrite Bangra in Bangra, there's nothing to be gained by that.
Alright. Back to work.
29.3.2017

I was hoping the work on Bangra wouldn't take much longer, but I was
wrong. Bootstrapping a static compiler is really really hard.
Absolutely none of my original ideas for it have panned out. All ideas
that came instead where actually better. In the process of building
Bangra, I gained even more useful insights:


Your hoist interpreter doesn't have to be written in a performant
language, but should instead be written in a language that makes
writing this interpreter particularly comfortable and reduces the
work you have to bring into it, while still offering a foreign
function interface. Compactness and maintainability matter over
performance.
I was struggling to support continuation passing style with garbage
collection in C++ in a compact, comfortable and debuggable way. The
line count had swollen to about 8000 lines with no end in sight.
So, after doing some tests, I rewrote the interpreter in Lua within
a week, to make use of LuaJIT's tail call optimization (permitting
endless forward calling) and GC. The result has roughly 5000 lines.
The language I worked on before, "None", also runs on LuaJIT, but
the difference is that None compiled to LuaJIT bytecode. As a
result, many Lua-specific idiosyncracies bled into None, such as the
type system, Lua tables, API functions, the 60 upvalue/200 locals
limit, or indexing starting at 1 instead of 0 by convention. Bangra
runs in a bootstrap interpreter written in Lua to isolate it from
Lua internals and keep control over the design.
What I have now is an interpreter running in an interpreter. That
means Bangra runs about 4x times as slow where x is an overhead
constant incurred every time a process is virtualized. But it's
fine, because we'll
Futamura
the shit out of it.


Writing a Cheney on the
MTA interpreter
by hand is laborious and renders the resulting code unmaintainable.
I don't want to get too much into this right now, but there are a
bunch of devils in the details. Suffice to say, once you have
allocated some memory, you don't ever want to move it again,
especially when other memory depends on it. Also, CMTA has no way of
tracking dead memory, so a special solution is required here too if
you need to invoke custom destructors. I put a week of work aside to
explore this avenue and it was a waste of time.


Church encoding
helps reducing your type system to the absolute minimum required to
permit the language to bootstrap itself.
The C++ implementation made extensive use of tuples (unnamed
structs) to pass values between functions. I removed tuple types
from the Lua port as soon as I realized that when you have closures,
you already have tuples. In continuation-passing style, you can
encode any tuple (x0, x1, ..., xn) as a closure
fn (f) (f x0 x1 ... xn) where f is a function that receives all
tuple values. If f doesn't know or care how many elements the
tuple stores, it can take a variable number of arguments.
Furthermore, if the language supports multiple return values (and
Bangra does), the closure can be simplified to
fn () (return x0 x1 ... xn), where return is an implicit return
function provided by the caller.
The same goes for enums and the inevitable switch cases that use
them, which leads us to the next point.


Intrinsic compare functions in conjunction with continuations
provide a great way to decimate the number of cases to consider when
doing comparisons between two values.
This is an original solution I haven't seen elsewhere yet, and which
I'm particularly proud of. The challenge is to allow types to
overload relational operators, preferably through a single function,
while also keeping the dispatcher to these operators simple.
C++, Python and Lua for example require a minimum number of
overloaded operators to consider the interface complete. If you
specify operators for == (equal) and > (greater), you can use
negation to deduce != (not-equal), <= (less-equal), and through
deduction, all other operators from the set == != > >= < <=.
Because some types offer fast operations for different cases, or
have special values that don't compare (such as NaN for real
types), the dispatcher that translates what the interface provides
to what the user expects has to consider every permutation.
Composed from these operators, C (like many other languages) offers
another approach to comparisons, mostly used in conjunction with
sorting algorithms, a function whose C signature resembles
Ordering compare(T a, T b), where Ordering is an enumerator
(told you this would come up) with one of the outcomes Less,
Equal or Greater. Here we have a single function dispatching
each relational operator to an equivalent enumerator, and on the
user side a switch case that is able to turn each enumerator value
back into code flow. compare is composable; we can comfortably
build a compare for tuples from compare functions for its
elements.
So why not attempt to invert this relationship and, rather than
composing compare functions from relational operators, compose our
relational operators from intrinsic type-dependent compare
functions instead? There are however two obstacles to pass before
this solution works.
The first obstacle is that compare only covers ordered values, but
we also need to compare unordered values such as structs (and values
of unrelated type), or partially ordered values such as types (which
can be unrelated or supertypes of each other) and reals (where a
comparison with the NaN type is always unordered). To that end we
add a fourth result to Ordering, which is Unordered, and covers
all these cases where the result is neither Equal, Less or
Greater. Special thanks here to
@korenchkin, who provided me with
the mathematical idea of ordering and its accompanying technical
terms.
The second obstacle is the paradox of comparing the result of a
comparison. In order to compute (compare a b) == Equal, we need to
expand it to (compare (compare a b) Equal) == Equal, which expands
to (compare (compare (compare a b) Equal) Equal) == Equal and I
think you get the picture.
We solve Zeno's Paradox of Comparisons by reaching once more for
church encoding and turning compare into a four-way branch
function of the format
fn branch-compare (a b f-equal f-unordered f-less f-greater) that
compares a and b and, depending on the outcome, invokes one of
the four continuations. Then the relational operators can be
implemented with one-liners. a >= b for example expands to
branch-compare a b return-true raise-error return-false return-true.
There are additional upsides to this method. Relational operators
often appear in conjunction with branches. Since >= already
created four branches, any branch that depends on the returned
booleans can be folded into a direct invocation of branch-compare.
branch-compare has no direct equivalent on hardware (as far as I
know), but can be lowered without much difficulty depending on
atomic type and the number of unique outgoing paths.


I wanted to be done with the compiler by February so we could prepare
the March release of NOWHERE, but was struggling with winter depression
and the fact that things took way too long to move forward in C++.
Really, every time I touch that damn language, pain and misery become my
permanent companions. Now that the interpreter runs in Lua, I progress
much faster with my agenda, but there are still a few things to do
before I can assemble the next alpha. Here's a laundry list:


The few Bangra tests that I have need to be fixed to match the
modified parsing rules and the reduced type system. The FFI needs to
be ported before step 4, along with the clang parser (Both still
exists only as C++ implementation).


A partial evaluator / specializer needs to be written (in Bangra)
before Bangra functions can be trivially compiled. It takes a
closure and optional parameters to pass to that closure's entry
point (of which each parameter can either be constant, typed-value
or any-value) and returns either a constant (in case all parameters
are constant and the function has no side effects) or a new function
with all constants and closure values inlined and evaluated, all
captured closures eliminated, and all parameters typed (where
possible). The result has to be clean enough to use it for the next
steps.
All Bangra functions are already stored in IL, so I don't foresee
any logistic problems here. I have never written a partial evaluator
before but conceptually, I understand the problem well. Nevertheless
I expect surprises.


A lifetime tracker needs to be written (in Bangra). It annotates the
IL with information on what values are required in each
continuation, how they should be allocated and when they are dead. I
know next to nothing about lifetime tracking and don't even know my
requirements here. I just know the generated machine code won't
depend on a GC, so we'll probably need metainformation that tells us
where and when to use alloca, malloc and free.
Up until a few weeks ago I had my doubts that steps 2. and 3. were
even possible, considering that I hadn't yet seen a project based on
aggressive deep type inference which did this, but then I discovered
Carp and that lifted my
spirits. I had productive talks with Erik Svedäng, head dev of Carp
and also game developer (in fact, like Bangra, Carp is written with
game development in mind), and I hope I can learn more from him on
these subjects.


A LLVM IR generator needs to be written (as described in this
paper)
that takes specialized IL and turns it into LLVM IR (using the LLVM
C library). LLVM can then compile the IR and we have successfully
translated Bangra code to an executable format. This is where
execution of Bangra code speeds up to 1x for the first time!


Analog to 4., a SPIR-V generator needs to be written that translates
specialized IL into SPIR-V. The format is very similar to LLVM IR,
and I know at least one helper library designed to assist with
generating SPIR-V. Now, with the power of Vulkan, we can magically
run Bangra functions on the GPU as well as the CPU, so shader
functions and gameplay functions can source the same math library.


The Liminal game engine needs to be ported from None to Bangra. The
languages are largely similar so this is mostly typing work.


Now the codebase is big enough to port NOWHERE's implicit function
renderer and add the new CSG features. Then the world renderer works
again.


From here, gamedev can commence and I can work on procedurally
generating game assets and wiring them up with logic. Textures,
models and scenery, sound, audio sequencing.


Inbetween 6. to 8., there's going to be a problem of a rapidly growing
codebase that takes longer and longer to load. The interpreter already
takes 1.5 seconds to bootstrap Bangra. Macro expansion in particular is
very costly. A quick fix is to cache the generated IL on disk (which is
what None did), and that takes the writing of some serialization
routines. IL caching can't however cover for the LLVM compiler which I'm
told is not the fastest one on the block. I can also try to devise some
caching here as well, but these are not pretty solutions.
What follows is the pretty solution as outlined by the Futamura
projections,
and I'll try to restrain myself to implementing it after the alpha has
been released:


The interpreter needs to be ported from Lua to Bangra so Bangra can
run and most importantly specialize itself. Since both languages are
very similar, this will be relatively straightforward, but some
types internal to Lua will need to be reimplemented in Bangra at
this point up until the required functionality, prominently strings
and hashtables (Liminal has a great hashtable implementation).
Now we have an interpreter (Bangra in Bangra) running in an
interpreter (Bangra in Lua) running in an interpreter (LuaJIT).
Would we actually run anything of value here, execution would slow
down to 8x times as slow as a native execution. (We could also run
the interpreter repeatedly recursively and bring everything to a
crawl, which is more limited by available memory than computing
power.)


Using the functionality developed in 2-4, we can now specialize the
Bangra interpreter for a given bit of Bangra source code, and yield
a binary (which embodies the interpreter running the code processing
arbitrary user input into arbitrary output). e.g. specializing the
interpreter for NOWHERE source code would yield a NOWHERE
executable. The translation process is rather slow as it still runs
through the interpreter. But we can do even one better.


We proceed to specialize the Bangra interpreter for the specializer
and yield an executable (embodying the interpreter running the
specializer processing arbitrary user code into binaries that work
as described in 11), yielding the specialization process as an
executable that can take NOWHERE source code and yield a NOWHERE
executable. This is also known as a compiler. Now compilation is
also fast.


Not necessarily required, but a nice bonus: we can specialize the
specializer specializing the specializer. You read that correctly.
You get an executable embodying the specializer specializing
arbitrary interpreters into binaries that work as described in 12.
In simple words: you put in an interpreter for a new language (where
the interpreter itself is written in Bangra), and a compiler for
that language falls out. There is a great step by step explanation
of how that works on this
blog.


And that's it for now.
25.10.2016

Hooboy, this Bangra business is taking way longer than anticipated.
Bootstrapping a static compiler is hard. Nearly none of my
original ideas for it have panned out. On the upside, the ideas that
came instead were actually better. In the process of building Bangra, I
gained several useful insights:


If you want your hoist language part (in this instance, C++) as
small as possible, with all important bits written directly in your
new language (in this instance, Bangra), the smallest thing you can
write to get there is an interpreter. Going straight to any form of
static compilation without a dynamic language stage is too hard.


Don't bother with memory management when you are bootstrapping. Just
allocate and never free anything. It will be fine. You can always
write a
GC
later.


Your bootstrap language has to be comfortable before you write
anything of value in it; Your actual compiler can only be written
once the language is convenient. If you write it in a minimal
IR,
like I originally planned, you'll end up with unserviceable bloat.
This is where the interpreter stage helps.


The compiler will need to interface with libc and
LLVM, so a C
FFI is
most opportune if you don't want to blow up the hoist part further;
Doing FFI through LLVM itself is too bulky.
libffi is a better choice: it's
tiny in both interface and implementation and comes with alignment
info for all primitive C types.


LLVM IR is not a good intermediate representation for your own
language. You need a mid-level IR that fits your model better. The
Swift developers published an interesting
talk
last year to that end, and Rust announced their mid-level
IR only this April.
For the interpreter, I settled for something far more minimalistic.
See next point.


It is possible to compile closures as zero-cost abstractions. I was
made aware of a fantastic
paper
from last year, which introduces a new intermediate form that unites
concepts from both
CPS and
SSA
forms, called CFF (control flow form). CFF formulates each statement
as a continuation (a lambda function that never returns) with a
single function application as its body. Continuations passed as
arguments turn into closures. The paper then introduces lambda
mangling, particularly lambda lifting and lambda dropping, to
eleminate closures. A final transform can then translate CFF to LLVM
IR for machine code generation.
To this end, I picked CFF as intermediate form. It turned out CFF is
also ridiculously easy to interpret: the eval-apply loop for CFF
requires less than 100 lines of C++.


If Bangra is supposed to interface with C without friction, and the
compiler written in Bangra is supposed to compile itself, the data
model of the language should accomodate those goals. That means the
interpreter should not deal in dynamic object types (CPython uses
the PyObject type for this), that require conversion to/from ffi
compatible forms when interfacing with C functions. Instead, the
interpreter should deal with C values, structs and arrays directly,
so that no conversion is necessary.
But where do we store the types that an interpreter requires in
order to make binary blobs inspectable? The answer is: fat pointers.
Bangra's fat pointer is a dynamic pointer that stores a type
reference along with its target in order to support interpreter
features that depend on inspection: type checking, overloaded
operators, bounds checking, element access and garbage collection.


The more of your hoist language data structures are written in C,
the less work you'll have to put into bindings. Using libffi and fat
pointers, the interpreter can access all memory directly.


Fat pointers turned out to be a fantastic solution for several of my
problems, and my only regret is that I had not learned about them
sooner. I am still in the process of transitioning the interpreter to
this new format, along with the introduction of libffi for FFI calls
instead of LLVM.
This is by the way the densest problem I've worked on so far. The
interpreter hovers around only 6k lines of C++, but I've agonized over
these lines in a way I never had to before.
I hope that the interpreter work is completed somewhere in the next
week. Then I'll start work on the compiler, which will be needed for
what's next: the new version of the engine uses Vulkan instead of GL by
default, and the Bangra compiler will be able to target SPIR-V directly;
it's just a matter of translating CFF to SPIR instead of LLVM IR. Once
that's done, I hope I'll never have to deal with generating GLSL code
again (except for the mac port), and that will significantly speed up
the next development phase.
The realtime CSG renderer is practically done and waiting in a drawer to
be taken out and put into use once the Vulkan backend stands. I can't
wait for the moment when the plan finally comes together.
18.7.2016

Looks like I've been slacking on my habit of regularly updating this
devlog, probably mostly because I was ashamed that in the past three
months I've been procrastinating on game engine work, and instead worked
on perfecting our programming environment after I had a hard time
writing a new visual editor for live programming, which was supposed to
help with procedural audio design for NOWHERE, for which the ability to
tweak parameters in real time is a must in order to be able to,
ironically, progress swiftly. And there I was, shaving the yak again.
And what a beautifully naked yak it has become.
My trouble started when I began to realize that keeping source in Lua
tables was at odds with needing to render the UI as fast as possible,
using dear imgui (Oh - I also wrote comfy bindings for imgui. So there's
that.), as Lua doesn't really give you a high performance interface to
its data structures. So it would have been either calling dynamically
scripted functions from statically compiled code, or using the Lua C API
to iterate and edit tables, which turned out to be a pretty awful
experience. Once again, abstraction becomes the enemy. There's no direct
access to the data structures, instead you're supposed to communicate
with the VM using an awkward stack machine interface that renders all
code unreadable, and doesn't lend itself well to abstraction due to
constantly moving stack indices and a significant overhead associated
with parameter checking.
I can't remember how I got to the conclusion so I must have intuitively
felt that I had to fork Terra (as
LLLua) in order to be able to fix
a few long standing issues, primarily the way too difficult process of
building win32 versions (didn't manage to build a single working
version) but also the inability to annotate struct alignment for better
auto-vectorized code generation, hooking initialization mechanisms in
the AST, support for latest LLVM, throwing out code I didn't need (like
the Lua language extensions).
In the process of maintaining LLLua, I had to concede that I don't
understand LLVM and Clang very well; both C++ API's are incredibly
complex, and one had to spend some time with the architecture to get a
better understanding of how it worked.
So I put a week aside for an experiment, a small toy project which I
felt would be an ideal way to get to know the LLVM infrastructure. The
goal was to write a cousin of
None, a minimalistic
statically typed Scheme-like, implemented in a single C++ source file,
that would leverage the low-level MCJIT capabilities of LLVM in order to
create a self-expanding language. I nicknamed it "Bang", after the big
bang which created a universe from the smallest possible point.
Based on my experiences I expected the experiment to fail; There was a
reason Scheme was a dynamic language, so I argued, and good
metaprogramming would therefore need some kind of Terra-like two tier
system: a dynamic VM like Lua would bootstrap a compiler using a
language toolkit like LLVM.
But the cul-de-sac moment never came. Instead, the undertaking proved to
be frustratingly fruitful, and it became impossible to quit working on
it because I kept succeeding, and the system exposed attributes early on
that None lacked.
With the help of MSYS2, Bang proved to be incredibly easy to build on
Windows. So easy, actually, that I almost look forward to booting
Windows to make a new binary. I don't even use a build system; Just a
Bash script for Linux and a batch file for Windows. Bang uses the
LLVM/clang distribution on Linux both as its sole dependency, and as its
compiler. On Windows, compiling and linking had to be done with gcc.
Bang has recently been renamed to
Bangra. Its release number is
now at 0.2, there are binaries and documentation online and I wrote an
introduction that
gives a rationale for its existence. The only language Bangra supports
at the moment is the more low level IR language, which has a restrictive
type system, but due to its hygienic macro support, even coding at this
level is beginning to become convenient.
I'm currently in the process of implementing a new version of Nonelang
on top of it, as Bangra's default high level language. When I'm done,
the Liminal game engine will be ported and the Nonelang runtime retired.
Then it will be possible again to do Windows builds on a casual level,
and a big hurdle to releasing the next NOWHERE alpha is overcome.
6.3.2016

It's time to document what I've learned so far, before I dive in and
tackle the next topic (which is diffuse indirect lighting). Let's start
with a video of what the
renderer looks like right now. And here's a
video with slightly more
geometry.
It turned out that the point renderer as I envisioned it was simply not
feasible for dynamic scenes. Trying to solve geometry over multiple
frames produced all sorts of reprojection woes that I don't want to get
into right now.
The current renderer, which I'll stick to, is able to dynamically render
up to 64K solid convex superprimitives per frame per viewport,
raytracing only, no triangles. Everything is movable and morphable,
there are no static elements, which is perfect for outer space and
highly dynamic scenes with rearranging geometry.
The primitives (which in the engine are called brushes) are of a single
type that I engineered for this application,
sdUberprim. Uberprims can
represent boxes, spheres, cylinders, capped cones, capsules, pellets,
and smoothly morph between those shapes. (My ideal superprimitive would
be a tetrahedron or hexahedron with individually weightable edge
smoothness, but I couldn't figure out a distance function for those. Any
takers?)
On my GTX 460, the rendering time ranges from 4-10ms for a single
1280x720 viewport. Depth complexity is handled reasonably well. These
two worst case test
scenes give a good
sense of where the visual fidelity limit is. There's no material system
yet, and the timing is for a scene without shadow mapping, so 30 FPS
seems a reasonable target for this GPU. Current gen GPU's with three
times the compute capability of the GTX 460 should have considerably
less trouble.
I had to let go of smooth and subtractive CSG to get to this performance
level, and the primitives must be convex, because the abysmally slow
sphere tracing has been swapped for Newton-Raphson iteration, which can
iterate towards convex surfaces in less than 8 steps, and even more
importantly, is able to provide correct conservative
bounds (the inner bound
dissolves for features thinner in depth than the cone width, which is
not so desirable. Any takers?).
(smooth) CSG using intersections is still possible with Newton-Raphson,
as the intersections of two or more convex primitives are also convex,
but these operations can not be performed directly as viewport objects
are accumulated, but have to be performed in separate registers first,
and I couldn't figure out a good data model for that yet.
If you would like to implement a similar system, here's roughly how the
renderer works:

Conservatively rasterize all brushes as oriented bounding boxes into
a Z-buffer of 1/16 viewport resolution, and raymarch each fragment
to cull the surface further and provide a relatively accurate
max-estimate on Z distance.
Rasterize the same brushes again, with identical parameters, but
this time the brush index of each raymarched fragment is written
into a per-pixel linked list (I call it an OBB texture), again at
1/16 viewport resolution. To avoid writing too many fragments per
tile, the generated Z-Buffer texture from step 1 is queried. If the
raymarched distance lies behind the Z-Buffer distance, the fragment
is occluded and will not be written.
Initialize a brush index texture from all tiles in the OBB texture
that only have one fragment. Those tiles are ultimately resolved and
will not change again. The work remains to refine the remaining
tiles, as those contain overlapping contours of two or more brushes.
Initialize another sparse linked list (I call it the contour buffer)
from all tiles in the OBB texture that store more than one fragment.
The contour buffer is an array of linked lists, one array entry per
multi-fragment tile, pointing to a list of brush indices and min-ray
distances (see implementation notes) for this tile.
Increase the brush index texture resolution to 1/4 size of the
viewport. We're now going to fill in some of the remaining empty
tiles at the contours.
Using a compute shader, subdivide every multi-fragment tile in the
1/16 contour buffer into 4x4 smaller tiles of which each tile is now
4x the viewport pixel size, and store these tiles in a new 1/4
contour buffer. For each of those tiles two raymarching passes are
needed: first, find the nearest interior bound of all overlapping
brushes, then discard all brushes behind that bound. If only one
brush remains, write the brush index directly to the brush index
texture. Otherwise, add the tile to the 1/4 contour buffer for
further refinement.
Increase the brush index texture resolution to full viewport size.
We're now going to fill in the remaining empty contour pixels.
Refine the contour buffer once more (as in step 6), from 1/4 to full
resolution, with some modifications. Only a single pass is required,
finding the brush with the nearest exterior. The result is written
to the brush index texture. We now have a completed brush index
texture.
For each fragment of the brush index texture, raymarch the brush at
that pixel coordinate and use central differences or automatic
differentiation (which is only slightly more efficient) to compute a
high quality normal. The normal is written to a normal buffer, and
the depth written to a depth texture.
Apply deferred lighting to your hearts content. Season to taste with
salt and pepper. Serve hot.

The efficiency of this method stems from the implied early occlusion
culling and the refinement of contours only, which approximate curves
with infinitely small thickness. N contour tiles are refined to 4N tiles
on average, while the number of overall pixels increases by factor 16.
Some implementation notes:

When rasterizing the bounding boxes, all vertices are shared. Each
vertex of a box is connected to three edges. If a vertex has two
contour edges (a contour edge joins a forward- and a backfacing
triangle, thanks Marco Salvi for pointing that out), it will need to
be adjusted. The edges are shifted outside and the vertex is mapped
to the intersection of those edges. See Conservative
Rasterization
for more details. The second method is relevant here.
The distance function is solved towards
f(t) - bias*t / (min(screenwidth,screenheight) * sqrt(2)) = 0
where f(t) is a distance function, t is the ray scalar, and bias is
either 1 or -1 depending on whether we want the outer or inner shell
of the primitive. Subtracing the linear term effectively narrows the
distance to the ray as it progresses, turning the ray into a
widening cone with a spherical cap. See this
shadertoy for a reference
implementation.
The per-pixel linked list is very well described in the presentation
OIT and Indirect Illumination using DX11 Linked Lists -
AMD.
Our fragments are not sorted, though the implementation would
probably benefit from it. Cyril Crassin wrote about a version that
reduces bandwidth pressure in his
blog,
but for our use case that made no difference, as we were primarily
compute bound.
When refining the contour buffer, fragments have to be added and
linked up before knowing how many brushes would actually pass the
depth test, and another contour tile would have to be added. If it
turns out that the contour tile is not needed (as only one fragment
has been added), then the fragment is orphaned. This is in practice
no problem and carries very little overhead.
Convergence can be sped up by storing the computed ray distance with
each fragment in the OBB texture and contour buffer, and passing
that ray distance down in the pipeline to the final depth resolve to
reduce the number of raymarching iterations.

16.2.2016

As mentioned I tried raytracing again, RevAA, then back to SDF and then
developed a piecewise polynomial
evaluator for rays to speed up
convergence, but it's simply too slow. I wrote a bit about this
here. To sum it
up, when evaluating complex distance fields, every iteration hurts. Just
one to two iterations per frame are okay.
So I tried the next thing, based on an idea that Timothy Lottes had,
about using particles to represent the scene, and am experimenting with
points converging on a distance field scene along rays. Here is a
video of the technique in
action. Performance on my GTX 460 is fairly good (a steady 1-2ms,
because the work is completely uniform, regardless of scene complexity),
while image quality is terrible, but now I'm optimizing the picture
rather than performance (could be worse: I used to work on both at the
same time), which feels like the right domain to think about.
Several avenues of improvement remain: a) Attempt to no longer limit
lifetime of points and instead do a perfect detection of when points are
truly dead. b) Seed points not at random locations but only to fill
cracks. c) Seed points not at near plane, but closer to depth at
surface.
Some more details to these directions:
a)  These points are recycled so far: backfacing points, points that
leave screenspace. These points need to be recycled as well: points
covered by the Z-Buffer, points in regions with too high point
density. It all hints towards maintaining points not as a list of
coordinates, but directly on the framebuffer (on a grid), because
this way the grid automatically culls backfacing points, points out
of screenspace, points covered by zbuffer and surfaces with high
point density. Storing a point per pixel implies all those things.
(However it is a big problem that I can only write 32-bit atomic-min
values in GLSL, which limits the amount of information I can store
with a single pixel. I will need two passes: one where points
scatter their new depth values, and one where the same points
evaluate if they've won and if so, write additional information for
that pixel. Ho hum.)
b)  Seeding new points to fill cracks requires either a histopyramid on
the Z-Buffer to enumerate empty pixels, or, when points are stored
in the framebuffer, this can be part of the iteration pass.
c)  Seeding new points closer to depth at surface to reduce convergence
time: not sure what to do here. Using the (unprojected) previous
Z-Buffer seems okay most of the time unless stuff fades in from the
sides, or new objects appear. The danger lies in picking a starting
depth that is behind a distance field volume, effectively clipping
that volume from view.
30.1.2016

Happy new year everyone! We just whooshed past our original deadline,
which was the end of 2015, aaand I'm still doing R&D.
What transpired? Reducing my tetrahedral mesh to a triangle mesh in 2D
still left me with an approach that felt icky to work with. I still
didn't know how to do painlessly procedurally generate structures with
triangles. What I could work with, however, were lumps of classical game
objects as I used in the last 2D prototype (a small demo movie
here), but I
thought that would bring back the old object vs. terrain dichtonomy, and
I always wanted a single system to do everything, as the world is also
not split into living and dead things, but everything can fluidly
transition from resting to animate and back. Splitting a game into world
and objects was a performance hack that ended up with the creation of
two systems that are incompatible to each other, and make transitions
hard.
Then Media Molecule started demoing their Dreams project (just look at
this awesome demo and how
easy they can create complex topology with it). I had occasional talks
with Alex Evans about the technology they were using, and we shared our
R&D woes in the past years. A few years back I had dabbled with similar
systems for NOWHERE. I had thought that their approach would not work
for the huge continuous planetary landscapes that I was envisioning. But
as my dream of tetmesh backed giant realtime transforming asteroids was
collapsing under the weight of what computer consumer hardware can
realistically do in 2015 (it's still a great idea - for the computers of
2030!), and admitting that I was unable to explain to myself how I would
actually teach a program to generate and alter shapes this way, I took
another look at Alex' SIGGRAPH
slides and
repondered on the intersections of our projects.
Then it dawned on me.
The first thought was that since composition of operations was trivial
with Media Molecule's smooth CSG approach, it would also be trivial to
write procedural generators for it: smooth-add and -subtract a bunch of
boxes and ellipsoids, and evaluate the resulting shape as a point cloud.
Then compose the whole world out of these objects. An asteroid for
example would then be simply a huge blob made from many medium-sized
rocks clumped together. Digging a tunnel would be removing rocks from
the structure. It would have the added benefit of keeping surface
irregularity; any dug tunnel would still look rocky. And because we have
smooth CSG, we can blend structures and make them seem more continuous
instead of mashing up discrete elements.
That brought me to the second, and even more important realization: I
was approaching the abandonment of the terrain/object dichtonomy from
the wrong angle. All this time I thought that the way forward would be
to understand objects as a special case of terrain, that any game model
is a sort of mini-terrain - one volumetric body geometrically built out
of a connected graph of tetrahedral cells with different attributes. As
this was how the physical world worked: as a fluid laplacian field where
everything was connected. The wave-like way of seeing the world.
Sidenote: It turns out that while graphs are the lowest common
denominator of all data structures (you can describe any data structure
as a graph), computers are exceptionally awful at parallelizing any form
of graph work. It takes a lot of research to make these things fast.
It's much better to work with trees, as trees only branch - they don't
merge again. It's even better to work with linked lists, as lists don't
branch - they just skip. It's absolutely excellent to work with arrays,
as arrays neither branch or skip, and can just be traversed linearly, a
job that parallelizes easily. GPUs want arrays. It is possible to
describe a tree in form of a list, and a list in form of an array, and
most certainly can a graph be described as a tree, but these reduced
data sets are lossy and therefore immutable. Triangle meshes make
editing bearable by keeping a few connectivities invariant (e.g. every
triangle edge is adjacent to only two triangles), but graph editing
incurs a constant chasing of pointers (implying random access all over
memory) that is detrimental to performance of contemporary computer
architectures.
Anyway. It did not occur to me that the terrain / object dichtonomy
could also be approached from the other side: that terrains can be
described as superset of objects. Our smallest atom is no longer a
vertex, a singularity, but a geometric primitive that has surface and
volume - the particle-like way of seeing the world. Any terrain can be
composed from primitives, any game model is a composite of primitives
anyway. We've been doing this for years already, in our rigid body
physics systems. Assuming that we can find a simple way to smooth the
surface gradient to get rid of hard corners that invariably form where
primitives overlap, the result can look just as convincing - and
Smooth-CSG offers just this approach.
Philosophically, it makes no sense for a computer game to attempt to do
a full physical simulation. This way, we end up with an abstraction that
is so occupied with every atom of a tree, we lose all semantic
understanding of the tree. Traditionally, games prefer to organize their
world semantically - the world not as we experience it, but as we make
sense of it, and how we communicate with it. Every game object's
appearance exists merely to convey its semantical points: the color of
leaves of a tree may communicate the tree's genetic heritage, the season
we are in, the quality of the soil it is embedded in, the cleanliness of
the air that it breathes, the presence of parasites and how much
sunlight it is getting. A physical simulation attempts to recreate a
system where the semantic understanding is an emergent effect. A game
takes shortcuts to explicitly recreate emergent effects without the
systems that produce them, and so discards a wide range of possible
emergent side effects for the efficient mapping of the few relationships
it cares about. It is an edited, condensed version of reality forced by
the evil ever-present constraints of computer performance :-)
"That is all nice and well", you may interject impatiently, "but what
have you actually done in the past weeks?". How rude of you - but also
how right you are!
The first thing I did was attempt to recreate the distance field
evaluator that Alex described in his slides, first on the CPU. I started
by assembling some of the algorithms I would need, such as a 3D hilbert
curve encoder / decoder (here's a
video of the 2D
version). Then with the help of @rianflo (who runs the truenext forum on
Slack, a place where developers interested in writing the next
generation of dynamic world games meet to talk business). I wrote a few
max-norm distance functions for popular primitives such as the box and
the ellipsoid, and did some initial spatial bisection tests in 2D
(picture). Then I
wrote and experimented with a GPU backed 3D point renderer (a beautiful
picture here) and
added hilbert curve ordering
(picture).
The last CPU-based version of the evaluator randomly spammed space with
objects
(picture), and
then I.. I concluded that the whole thing was already getting too
complex for dynamic scenes and threw it away.
That's not entirely how it happened. In the middle of it it occurred to
me that my old friend Interval
Arithmetic (IA)
could help describing more primitives than max-norm, which has no
composable arithmetic rules, would allow (in fact they often lead to
identical
results). Then it
transpired that, since IA operates well in distorted domains, I could
perhaps do the evaluation directly in window space and use a Z-buffer to
cull hidden and occluded surfaces early (2D prototype
picture). I
prototyped a few more IA operators
(picture) and
rendered a beautiful testing scene
here, which
convinced me of the practicality of working with special arithmetic
rather than distance norms.
It turned out that Alex had also written about a frustum renderer in his
slides, but this one operated on trees of volume textures, while I went
straight to dynamic surface evaluation. Alex helped me a lot by
explaining the minutiae of his approach, which allowed me to make more
informed decisions. I had to learn how to write compute shaders at the
same time, and I'm still not entirely fluid in it, and am forced to
figure a lot of things for the first time.
I wrote a first prototype of a GPU version (without CSG evaluator, just
a simple implicit surface for testing). Here's a
video of how the
implicit surface frustum rasterizer (ISFR) converges towards a solution.
A problem that I wrestled with was the behavior of evaluation close to
grazing surfaces.
Many surface samples can become relevant for the same pixel, and the
evaluator tends to do 3-10 times more work than necessary on the finest
resolution. With optimal scenes, the frame time can go as low as 1.5ms,
while it climbs to 20-30ms at grazing angles. Anyway, here's a picture
of some beautiful smooth-min
CSG rendered with
the ISFR.
I also did some ground work on a unifying function for many different
primitives (sdSuperprim
shadertoy) and smooth/hard CSG operations (Soft CSG
Superlogic shadertoy).
Another issue with the ISFR is the required log2(N) stream expansion
steps, as the frustum space has to be recursively bisected, and, to be
honest, Interval Arithmetic is not very good at dealing with rotations
and affine (shearing) transformations in general. Raytracing with IA has
been done before but seemed particularly wasteful, as this effectively
constructs a box around the ray, and reports positive for any surface
inside that box, often neccessitating a bisection even when the ray is
not even close to the surface.
So I bit the bullet and looked into the more complex and confounding
successor to Interval Arithmetic, Affine Arithmetic (AA). That is where
I am today. There is a beautiful
paper
on raytracing and voxelizing implicit surfaces with revised affine
arithmetic (RevAA) that I am following currently. I have written a
prototype on shadertoy that
demonstrates how effective RevAA can be in finding zero-plane
intersections between a ray (which remains error-free under rotation)
and an implicit function.
Classic intervals describe a box around the range (effectively offering
us a binary choice between "interval contains surface" and "interval
does not contain surface"), and there is no other way to refine the
result further but to bisect it into two parts and continue bisecting
until the interval reaches our desired resolution. Not so the revised
affine interval. Because each affine interval is a parallelogram (see
shadertoy prototype), not only can it tell us whether it potentially
intersects the surface, it can also prune the range further in the same
step, giving us a far smaller search range to bisect. In the ideal case
where the function is linear (as when evaluating the faces of a box),
the resulting intersection width can even be zero, terminating our
search after a single step!
This means that when evaluating surfaces with RevAA, we only pay for
surface complexity (high curvature) and depth complexity (many layers of
surface behind each other). Unlike with IA, grazing angles of near-flat
surfaces are "free" because of the affine component and the resulting
pruning efficiency. Each affine computation is costlier than classic IA,
but we need less of them, netting a win (see the benchmarks in the
paper).
I am now inclined to reconsider raytracing, as using RevAA to do spatial
refinement doesn't offer the same ease of pruning, and on top requires
two more affine components. In 1D ray evaluation (where the ray travels
on f(x)=0), the range of intersection describes a simple interval (a
line). In 2D evaluation we measure the intersection of a zonohedron with
the f(x,y) = Z = 0 plane (which is our ray lifted by one dimension), so
the range of intersection describes anything from a line to a
parallelogram to a hexagon. Limited to searching a rectangular domain,
we can only refine a bounding rectangle, greatly diminishing the benefit
of pruning. You can imagine how bad the 3D case looks, where we do the
intersection in 4D.
These are the problems I'm currently facing, and which I attempt to
tackle in the next prototype:

Raytracing has notoriously bad performance even on GPU, because of
data and execution divergence. On the upside, we're evaluating
primitives instead of triangles (good ol' POVray style), so we get
to cover much more surface with much less data. There is a
paper on raytracing
with affine arithmetic on the GPU which I intend to study more
closely.
Without a refinement hierarchy for rays, we can only refine depth
information at the target resolution, with a linked list of ranges
and culled CSG operators per pixel. Starting with a lower resolution
as in this
paper and
refining when close enough to surface should help, but needs
testing.
I'm still lacking a bunch of affine approximation operators
necessary for CSG, particularly min / max / abs. min/max are easy to
do once an approximation for abs() exists, as
min(a,b) = (a + b - abs(a - b))/2. Fryazinov's paper gives a step
by step explanation on how to construct new operators, and that's
what I'm studying right now.

27.11.2015

Many additions to the 2D prototype, primarily figuring out the basic
mode of movement, which is now a sort of flat version of first person
controls; WSAD control lateral movement, while horizontal mouse movement
rotates. I'm planning to test a depth-adjustable crosshair for vertical
mouse movement. I started out with fish-like swimming controls, and
while that was interesting, I think the basic mode of transport should
offer high precision, as most of your movement in early life has high
locality. There will be many other combinable forms of movement suitable
for different activities later on, including the tether-based
transportation we had in the last alpha.
The next bit to do is one of the most important mechanics in the game;
In NOWHERE, your avatar is identical
with your inventory. What you have is what you are. Therefore a way to
manage your appearance is needed. I call it "self-assembly mode". In
this mode, bits and pieces floating by are held by your avatars
electromagnetic telekinesis abilities, can be rotated and attached to
your body, and assigned to keyboard keys and controller buttons. It's a
sort of diegetic version of
the Besiege interface. This
way, editing appearance, special abilities, loadout, role, character
stats, inventory is all folded into a single, infinitely expandable
system. There will be no additional skill tree. One look at a Nowherian
and you can tell what their primary profession is, based on the bits and
pieces they are made of.
I also need to reintroduce the GL rendering pipeline so the screenshots
become more presentable again (and the next alpha doesn't look too
terrible). Right now I mostly use NanoVG to render debug graphics.
Lots of improvements to the engine
codebase as of late,
primarily obvious modernizations and ports from the old Python-based
codebase:

Found two bugs in Terra,
#134 and
#132, and received a
fix within a few days. @zdevito is awesome.
A new entity component system (in Liminal, it's called an Actor Role
system) that facilitates adding new properties and behaviors to
actors in the game using object composition.
A massive parallelization infrastructure based on the concept of
Fibers,
inspired by this GDC
talk
by Naughty Dog developer Christian Gyrling. Kinks are not entirely
ironed out, but what I see so far is very promising.
A full implementation of the PCG
family of random number generators.
A new PCG-based Random class inspired by Python's Random that
produces random values with different distribution levels in 1D. 2D
and 3D.
Hash maps now attempt to maintain a 25%-75% load for improved cache
coherence and faster iteration (great esp if you only have a few
entries), and do in-place rehashing, avoiding memory reallocations.
Extensions and performance improvements to the vector math library.
Beginnings of a rotation math library with support for versors,
anglevectors and axis angle computations.
A port of the color math shader library to static, with various
color space conversion routines.
New implementations for Queue and Stack (Queue also works as a
ringbuffer).
Alt+Enter toggles fullscreen in glmain-based apps.

Business wise, in order to avoid confusion among our founders, we talked
about splitting off the 2D part into a separate prequel, working title
"NOWHERE 2D" or "NOWHERE Flatland" ;-) The 3D version will still be
completed as promised, but will have to be delayed because, as it turns
out, directly developing in 3D without solving most of your gameplay in
2D first is so hard, I would wage that with what we know now, a 3D
version takes the time of doing a 2D version, squared, if the 2D version
isn't worked out first. I was more than frustrated with that
realization, but then remembered that even GTA started out as a top-down
2D clone of the APB arcade game. Knowing how to do small iterations on
your work is key, and sometimes you have to go three steps back in order
to go six steps forward. Or something like that.
Anyway, the idea is that each founder gets a NOWHERE 2D download channel
in his humble bundle account, with a separate alpha, and eventually both
versions will be developed alongside, with the same engine. I hope that
this way we can make up to our founders for the extended duration of
development, and have a more productive time doing it. At this point I'd
like to say that we have amazing founders who have so far been more than
patient and understanding with how things have been going, and I'd love
nothing more than make all of them happy. One thing we'll never do is
abandon the project. To Sylvia and me, it's still the most interesting
thing to do in the world. As the saying goes, "It does not matter how
slowly you go as long as you do not stop."
11.11.2015

I finished the new input mapping system today, which allows to bind
arbitrary input events to logical events. A declaration of a mapping is
as easy as
input-mapping <name> <option> ...
An example declaration is:
input-mapping left
    ; if input is an axis, negate the value
    flip
    ; set an event handler for button presses
    on-press on-activate
    ; specify a list of default bindings
    defaults
        $
            Input.axis-leftx
            Input.key-left
            Input.key-a
            Input.button-dpad-left

The system allows to register handlers for released buttons and value
changes, and to set press/release threshold limits and controller id
filters. The bindings can at any point be cleared and re-assigned. The
generated event handling code is completely inlined.
I'm happy with how it works now, as the basic setup is open to future
extensions that don't force big changes in the interface.
6.11.2015

After struggling to implement the position based dynamics system I bit
the bullet and postponed most world physics features in favor of working
on gameplay, controls and AI. I went back to a simpler 2D physics
system, Chipmunk 2D, and am now reimplementing player movement controls
and the body editor. At this point, considering the pressure we're under
to deliver, I lost all interest in doing more world engine research and
just want to get it done. 3D and malleable worlds will have to be done
at a later point when our funding level is better and we have regained
the trust of our supporters.
The None programming language is now reasonably complete and I work with
it every day.
16.10.2015

Mesh refinement is working nicely now, and I wrote an
article on how it
works exactly for anyone who wants to implement it as well.
We were away four days to help my father-in-law move. I thought a lot
about how to do overloaded names in None in a generic way and arrived at
this
solution
(this is the test case for this feature), but could still not figure out
how to deal with the corner cases of declaring method names this way.
In the end, I unified the way methods are called in None (both dynamic
and static part). Before, there were three ways to do it:

(--> object (method-name ...) ...) which allows to chain method
calls, but it turned out I hardly ever used the chaining feature.
(object --> (method-name ...)), the infix version, which was also
cumbersome to use.
(object.method-name ...) which only works with specially rigged
classes and descriptors in the dynamic language, and specially
declared struct functions in the static language. That means all
imported Lua objects, including strings, can't be method-called with
this notation.

There was a bit of Twitter conversation about this topic with Michael
Zinn, and this was why I tried the
overloaded name solution (comparable to how Haskell and similar
functional languages do it), which effectively would allow something
like (method-name object ...) and make methods indistinguishable from
regular functions. Alas, this approach doesn't work well with dynamic
languages (or rather creates more confusion than it's worth), and so I
ultimately settled for this notation:
(.method-name object ...)
The notation is recognized by both the static and dynamic list hook and
translated into method calls. I removed the old notations and fixed all
tests and dependent code, and now it's all neatly uniform. Should have
done it like this from the start, good that I caught it in time.
Everything is now fun again. I also experimented with building a
Voronoi diagram
from the self-optimizing delaunay mesh. It seems that six neighbors
(three inner, three outer vertices) allow to build a reasonable map, but
there are often situations where this is not the case, because the mesh
is not fully conforming. Considering the set of the triangle vertices
full 1-ring neighborhood would give a complete map, but be more
computationally expensive.
I also began work on the Position Based
Dynamics
physics system for the self-optimizing mesh, but in order to be able to
play around with it more and devise interesting test cases, I need to
cut down turn-around times and work on an editor for it, which brings me
to a painful old problem, designing a good UI system.
I already wrote about this earlier in the devlog (7.9.2015), but I
haven't progressed very well on this item so far. I guess it's time now
to get it done, so I can move on and have more fun playing with physics.
4.10.2015
Earliest footage
of the position based dynamics physics system animating a bunch of
points. Mesh refinement struggles keeping up when there are more than
four points on screen, eventually a flip fails and the system collapses,
don't know why yet.
29.9.2015
Short update: I went back to the trimesh prototype and fixed all the
mesh operations that broke by porting to newer syntax. Next up is
finally implementing that purely gather-based approach I talked about
last month. It was
useful to devise a template for handle types (EdgeId vs EdgeIndex)
to detect parameter mistakes early. I also wrote a general
Mesh.validate function that checks if all links within the mesh are
coherent and no triangle is degenerate, that allowed to fix functions
very quickly.
Some things were added as part of fixing the prototype:

None has a float== comparison function now for doubles and floats
that checks equality within half epsilon (the precision limit of
each data type). Linear Algebra programmers know how important that
one is.
overloaded dot and cross functions for Liminal's GLSL-style
vector types (it's growing more GLM-like by
the day)
liminal.hash module that provides a generics-based wrapper around
CityHash64 for both static and lua (hashes nearly anything you throw
at it).

And two small things about None:


The globals fix that I proposed has been
implemented (with a few changes in the concept) and works
beautifully. __global can be used to register new global names in
the translated module scope-wise, and also redirect global names to
a name that holds a table within the file. using has been changed
to directly require a module name. The macro loads the module during
expansion and pastes __global statements for the exported symbols
into the source code. That means that using has no further runtime
cost than retrieving a name from the dictionary now, and therefore
performs much better.
It could still be possible to devise a using that resolves table
attributes in local scopes, either by passing the attributes you
want to access explicitly, or the name of an attribute list
(effectively a sort of type) that is available at compile time.


I didn't change if to ? yet, I might never do it or do it on a
lazy afternoon. I can imagine that the Lisp-style if might trip up
a lot of people (it still trips me up), so I still feel that would
be the right move. What do you think?


27.9.2015
What happened! Another 10 days gone!
Some work on Liminal (the game engine):

A new finite state machine
syntax that
makes it easier to tackle tasks like AI logic and game flow.
A function
list construct
to allow modules to hook events using the observer pattern,
facilitating separation of interest. Function lists are inlined
statically and so need to be complete before the first call. That
makes them fast, but also disallows runtime addition/removal of
hooks.

Lots of features and fixes for None (the programming language):

lots of language bugfixes
rudimentary online
documentation!
a comfortable naked
notation
that allows to write S-expressions entirely without nested
parentheses.
speeded up loading time from ~2 seconds to ~200ms (on my machine
at least) by caching generated bytecode. None needs the lfs module
(LuaFileSystem) in its path to be able to generate the necessary
timestamps, otherwise cached bytecode will never be loaded.
To allow for bytecode caching, all occurences of meta-eval and
meta-do that have side effects other than altering the active
translation context had to be removed because they can't be saved
with the bytecode; That means all syntax-*-global functions
operate in the script context, and newly registered syntax only
becomes available after execution of the script. Also,
syntax-export no longer allows to export the active syntax table,
but has to be called in the script context with a table created by
that script. That fix is usually quickly applied, but syntax tests
typically need to be moved out of the module that defines them.
nil is now named null, because None lends other keywords from C
(NULL) and Javascript (function, var), and therefore null
just fits better.
further improvements to tagging expressions with origin, and
traceback information
the parser stores columns for each expression now.
added ast-list-unpack for unpacking expressions with patterns
syntax-rules no longer takes extra arguments for literals and
hygienic variables. Instead, a literal can be defined inline using
the _:literal prefix, and hygienic names can be prefixed as
$:tempvar the first time they appear in the template.
syntax-rules also uses pseudoquotes now, allowing to unquote and
unquote-splice dynamic expressions into patterns and templates.
a switch macro as an abbreviation to cond without fall-through
an enum macro working similar to C's enum that also stores the
reverse lookup in the generated table and the enumerator count (last
enumerator + 1)
try has an alternate notation friendlier for naked mode

Two more changes to None that I want to see through next:

if works like its Scheme and Lisp counterparts, but is in a modern
sense much closer to C's ternary conditional operator cond?a:b.
cond has an unsual syntax for people used to classic if.
Therefore I'll experiment with renaming if to ? and adding an
if syntax that is convenient to use in naked mode and follows the
structure
(if <condition> (then ...) (elseif <condition>) (else ...) to
avoid surprising novices who come from other languages (it also
trips me up when I switch between None and other languages).


As part of getting bytecode caching to work, I separated core.n
into stages and found that it was difficult to figure out what
macros weren't active yet because None interprets non-macro
expressions as function calls, and symbols that it doesn't see in
the scope as lookups into the global namespace -- but failing
lookups lie dormant in functions and only become visible when they
are executed. What to do? It occurred to me that global symbols
typically don't change in scripts and are all expected to be present
before the script is loaded, so for the debugging session I taught
the translator to check unknown symbols against the active global
table and report unknowns. It was only a problem where the module
itself introduced new global symbols and used them immediately, so I
hardcoded a whitelist for that purpose.
This worked incredibly well, so here's my idea for a new rule: the
translator registers all global symbols in the tree its working on
and prepends a presence check to the generated bytecode. This way,
even cached bytecode can detect changes in the global table early
(minus changes in macros, because these are already expanded). A
script can use the new __ignore special form at any point in the
source file to add symbols to the whitelist of globals that should
not be checked. function-global, var-global and similar macros
are upgraded to implicitly use __ignore, because that is always
implied.
This should fix the typo errors at runtime that Python, Lua and many
other script languages are notorious for (an example
here).
Of course this means that __set-scope must no longer be possible
(as it locally hooks the global table) and the application of
using must be changed to a static approach. using must directly
reference the module to be imported, the translator then checks this
module for additional global symbols and implicitly resolves them.


17.9.2015
Went on another yak shaving binge for the past ten days and woke up in a
puddle of my own commits. Lots of exciting things happened for None's
infrastructure, where by exciting I mean lots of bugfixing and red tape
removal, which heightens my enthusiasm for using None to write more game
related code.
At the end of this post I also write something about our current
financial situation.
Let's see if I can retrace my steps, in chronological order:

Found that most of the static DSL expressions skipped nil arguments
thanks to Lua not being able to store nil values in tables. Fixed.
Made it so that GDB actually displays the proper function filename
and line number for anonymous terra-generated functions in stack
traces, but then found out that this crashes LLVM because it relies
on the anonymous name for ... something? Argh.

Shviller from the #nonelang IRC channel (we are on freenode, feel
invited if you'd like to program in None) suggested that None could use
https://github.com/franko/luajit-lang-toolkit to directly generate
bytecode, allowing to skip one extra parsing step and to directly
annotate bytecode elements with correct line numbers. Until then, None
scanned backwards through the generated code to find the associated line
number from the original source, and that was not only slow and memory
hungry (as each translated source needed to be kept in memory), it was
also faulty and imprecise. I wanted the static DSL and the GLSL DSL AST
nodes to be annotated from the lua stack using debug.getinfo(), but
with the current method, the result was slow and unreliable.
I went through a list of what needed to be done first before the
bytecode generator could work. I describe the steps in chronological
order, in present tense:


The None translator eval_none.n is written in S-Lua, which is my
version of Lua in symbolic expressions. S-Lua is the intermediate
form the None translator uses for code generation, which is then
translated to a Lua AST tree (by another generator, eval_lua.lua),
which is then translated to actual Lua by yet another generator,
format_lua_ast.lua. I thought back then that it would be smart to
write the None translator in S-Lua, to have a test case that proved
S-Lua works, but S-Lua ended up really not fun to write, and is yet
another language someone has to remember and think about.
Therefore I decide that before I do any more work on the translator
to do the bytecode translation, S-Lua will have to go. I do one last
translation of S-Lua back to regular Lua, and end up with a nicely
formatted eval_none.lua.


In the next step, I make all fundamental macro expanders in the
translator part of the-language-bootstrapping-itself and translate
them to core.n. A bit tricky because they need to be written
before the actual macro system is in place -- in a way, they are
the macro system. That part comes now first in core.n, and it uses
the macro-less verbose special forms that the translator
understands.
The outcome is that eval_none.lua is now a lot smaller than
before, and therefore there's less to refactor for the eventual
bytecode transition, which I expect to be annoying and cumbersome,
because I know nothing about the library I'm about to use.


For some reason I get irked by the fact that method accessors
(object --> (method ...)) are annoying to use in None. I "fixed"
this issue for dynamic classes before, by introducing Python's
descriptor classes to customize member access. Members can now bind
self to a function on access, and that allows to call methods as
(object.method ...) as one would expect.
Can I do the same thing for static structs? Thanks to terra's
metamethods, I can indeed, and now structs can also define methods
that use the simple accessor; LLVM's code generator is very
efficient and completely optimizes out the intermediate construction
of a struct that holds the instance reference.


I work on better error reporting while I translate the TriMesh
prototype to use the new method syntax. At that point I realize that
static AST nodes will never get annotated properly with the present
system, and I return back to the task of realizing the bytecode
translator.


It turns out that luajit-lang-toolkit is relatively simple, almost
fun to use and I refactor eval_none.lua to output luajit AST nodes
instead of S-Lua in a day. It's amazing how quickly things are
working again.


I also find out that the LuaJIT bytecode has no problem with
symbolic names that are considered illegal by Lua's parser.
my-f*ing-amazing-variable is a legal name to use there, and will
even be honored by LuaJIT's debug output. Lovely, but I'm not going
to make use of that fact yet.


The S-Lua -> Lua AST generator is no longer required, neither is
the AST -> Lua code generator, so I remove them both from the
project. Good riddance.


Lots of fixes to get coherent line number annotation all the way
from the parsed symbolic expressions to generated bytecode and
machine code.


A new __pragma special form that allows to turn compiler features
on and off for debugging. Very helpful for figuring out why new
features (in this case it's inlining pure Lua expressions to avoid
too many temporary variables) break production code. __pragma also
allows to dump generated Lua AST nodes, equivalent lua code,
expanded symbolic expressions, and the linenumber and filename that
AST nodes will be tagged with.


The static DSL tags each and every expression with a filename and
linenumber, which I don't have to do anymore now. I add a rather
elaborate visitor pattern that patches each and every terra AST node
with the correct anchor, and that turns out to be pretty elaborate
work. Terra has too many AST nodes.


One odd bug is that line numbers passed to the bytecode seem to be
modulated by 256. I guess that this is because LuaJIT uses the
firstline and lastline bytecode tags to pack linenumbers into
8-, 16- and 32-bit numbers, and indeed, once I generally define
lastline = firstline + 65536, the problem is gone. It's a hack but
I'm not going to expand the parser now so that it always supplies a
valid lastline. I hate the parser.


10 new tests have now been added as part of debugging new issues in
the prototype's codebase.


So there we go, bytecode translator all done. While I'm at it, I do some
more cleanup:


When I wrote the GLSL translator, I refactored None's macro expander
to serve any domain specific compiler (swap out the syntax table
completely, use __escape to stop the macro expander from expanding
any further, etc.). The static translator however still uses its own
macro expander, and that means all linenumber propagation work that
I did, I would have to do here again.
So I refactor static.n to use the internal macro expander, and now
it's all splendid. It still has its own "compiler" stage that
translates the expanded forms to terra AST node constructors, but
given that the static "compiler" does next to nothing, I wonder if I
couldn't completely merge that stage with the macro expander.


Which brings us to the present day. The TriMesh prototype currently
renders no content (but at least the hit testing works), so I'll have to
fix that next up, and then I hope I can finally work on the game editor.
Now on to financial matters.
It appears we are running out of money again. The long waiting time for
new content has eroded the number of new founders, and one of our long
time patreons had to reduce his monthly support because he's also in a
financial pickle right now. We also used to borrow a small complementary
amount of money from my mum each month and are about to reach the cap
next month, so we will also run out of that funding pot by end of
October. Sylvia is still waiting on payment from one of her commissions
(this was before Sylvia began to require advances), but it is very
likely they screwed us and we're not going to see a cent. There are
currently no new commisions in sight for her, but of course that can
change any day.
That means we have to think about how to get fresh funds before
September runs out. I've considered starting a Patreon for None/game
engine development exclusively, so we are at least getting support for
that part of the work (which is not what Nowhere players are necessarily
interested in, although they damn well should ;-)), but Patreon only
allows one page per account, and managing that one account is already
hard work for Sylvia, so we'll have to merge the tool development part
with the page and make it about more things than one. We'll have to
think about how to best present this.
We also need something new to show so we can justify doing a new video
and invite everyone to check out the new alpha. Sylvia has done lots of
new concept art that could go into a new video, we can finally exchange
that old dreadful logo, and replace "Psychedelic RPG" with a much more
fitting "Alien Life Simulator". The fundraising video is nearly 2 years
old. Back then we gave away the end game without mentioning many of the
intermediate steps, and I believe people would be more inclined to trust
us if there were a more detailed plan of action, and we would be more
clearly on our game being developed as a service, not as a one trick
pony.
It also becomes clear that we will not be able to deliver on our full
promise by the end of 2015, but I want to do a release anyway with what
we get to work by that date. We don't want anyone to feel disappointed
or exploited. The plan is to ship what's there, then continue on a
"sequel" (which is really just a new major version) and keep our
founders on board until they get what they paid for.
I can not believe how fresh our concept for Nowhere still is. The pitch
has been out for several years and no one has stolen it from us, and I
guess that's because it really is that ambitious and forward thinking.
We have ideas for art, sound and gameplay design as well as programming,
modding and player service which are quite different from how most game
development is done today, and I would love to see all our goals
realized in exactly the way we envision them.
I don't mind if it takes ten years to get there, but we need to
compromise on the progress on the way and reintroduce regular releases
of alphas, betas and "masters" to keep the company operational without
incurring additional debt.
I wanted to hold off new alphas until we had a game model we could
definitely stick to, but I see now that I have to get over my
perfectionism and, as the mantra goes, release early, release often.
7.9.2015
Lots of optimizations on the trimesh prototype. Streamlined the structs
and interfaces, translated the remaining shaders to the new GLSL
compiler (that went remarkably well, just had to add a missing discard
node). Now I'm back to thinking how I can add more map editing features
to the prototype, and the challenge of writing UI annoys the hell out of
me. I currently have no GUI framework I'm happy with. I considered
embedding ocornuts C-wrapped IMGUI, but I don't like how it's set up.
IMGUI attempts to provide a ready-to-use library of widgets, very much
the classical approach of UI frameworks. The problem with that system
is, that while it makes it easy for users to slap together a bunch of
preset widgets, they're typically completely on their own when they want
custom widgets - and they always want custom widgets. My OUI library
has a minimalistic abstraction, which I like a lot, because it is
agnostic to theming and allows to build any sort of widget from a
handful of configurable rectangles, a very Scheme-y design.
One could say: if IMGUI is the GL 2.0 (with lots of extensions) of
immediate UI, then OUI is Vulkan (less presets, more combinatory
opportunities) ;-) but OUI has a design flaw, which is that the way
items are cached between frames (to maintain coherence during stateful
operations) leads to the UI losing track if the order in the UI layout
changes, and all in all, I had heavy optimization issues with this
touch-everything-in-one-frame approach. It's awfully difficult to both
track and cache operations when your UI objects are all supposed to be
ephemeral.
Therefore I have begun work on a OUI2 (implemented in None), which is
going to use a retained mode again. The vision I have for it is to
become a minimalistic system to facilitate a browser-like DOM (something
that I tried to write with OUI before). I enjoyed designing webpages in
the browser more than working with any UI library, but I despised the
shit stacking and the legacy baggage. Considering that symbolic
expressions drive everything in my engine, I believe this is the perfect
abstraction for GUI / data visualization / websites etc as well. Where
Chrome has HTML/CSS/JS syntax, I'm eventually going to have
S-expressions with just three different domain specific semantics (of
which the JS part is already done).
The principle of items (the lowest abstraction of a widget) and the
layout algorithms (box model, word wrapping) are kept, but the items
will remain persistent in a pool. I'm also going to make changes to
event handling. OUI will have an outgoing event queue (& associated
callbacks) for abstract item events (hot/cold, clicked, focus/blur;
drag, drop; etc.), and there will be no more polling of item state, as
that technique is frame-dependent and typically ends up being lossy. The
modifications open the door for implementing future optimizations like
partial layout refreshes and draw surface caching.
The trimesh map editor is OUI2's first test case. Considering how many
GUI systems I've used and written before, I hope that I'll end up with a
lean minimal UI system that remains expressive and resourceful for years
to come.
31.8.2015
I managed to get the GLSL compiler to a working state, and the trimesh
subdivision shader (which is fairly complex and uses three stages) is
now working perfectly;
https://gist.github.com/paniq/aff6395684c22aa978d1 is translated to
https://gist.github.com/paniq/ad0783d4c61a5ba3f72a. It's now going to
be comfortable to write and interface with new shaders. I still need to
add a few small tweaks and improvements, plus a new testing mode that
allows to directly test shader functions on the GPU and read out their
return values. After that, I can start working on a faster Delaunay
transform shader.
13.8.2015
Not so much happened the past 10 days; We visited Sylvia's dad and then
fought against the looming heatwave. Our studio has a mobile climate
unit that eats a lot of power and makes the room halfway bearable. The
heat is distracting and it's difficult for me to think about the current
problem that I have, which is transforming a triangle mesh into a
constrained Delaunay triangulation on the GPU, but without compute or
atomics. I rubberducked the problem within this article, but the
write-up got so long that I saved it here instead:
https://gist.github.com/paniq/e09a9f6c5b8b9aa1b835
3.8.2015
Lots of things happened the past two weeks, along with my birthday. I
took three days off to catch up on recent games.
Conspire had matured enough that it was time to single it out into its
own repository. Oh, and I also renamed it. It's now called "None", which
is a backronym. The website (which currently just redirects to
Bitbucket) is at http://nonelang.org. The language should run on
Linux, OSX and Windows. There are a dozen test cases now that demo
aspects of None.
There are too many small changes to None to mention here, best check the
commit log: https://bitbucket.org/duangle/none/commits/all - The most
striking improvement is that the largest parts of macro expansion are
now accessible as hooks from within the language, scoped, so that the
DSL can be completely swapped, and global macros overridden. The static
language still uses its own expression parser, but that is no longer
necessary. Using the built-in macro expension has the benefit that the
error reporting facilities can be reused. Expressions that cause trouble
are included in the traceback.
After all these upgrades, I thought that it might finally be doable to
try a GLSL backend, and that's what I'm sitting on now, the shader DSL.
The general idea here is that you can compose your shader functions
along with your dynamic and static functions (semantically,
syntactically it's almost identical).
The GLSL code generator implicitly figures out the necessary
dependencies to generate vertex, fragment and geometry shaders (compute
to be added later). This becomes most convenient when dealing with
shader in/out variables that connect the stages with each other,
vertex streams and framebuffers; In the shader DSL, they're called
shared variables. The code generator tags these using a simple
heuristic: when a function sets a shared variable, it's an out for
this shader stage, otherwise it's an in. If the shared variable is
neither read nor written to, it is not declared. This way, a shared
variable only needs to be declared once, and is then included in
generated code where needed.
There's also support for declaring overloaded external functions that
are either built-in or residing in native GLSL source code. The code
generator does a type inference pass, matches applicable overloads,
enables the extensions and includes the source code required to make it
work. To avoid notation fatigue (a condition that I just made up), type
inference is also used to figure out when a numerical constant is a
float, and when it's an int, with strict implicit conversion rules: an
int can be upgraded to float, but not the other way around. Overloads
first try a perfect match, then an implicitly casted one.
Then there's the implicit goodness of None's powerful macro system:
dynamic functions can build permutated shader functions from templates
and quasiquotation. Shader functions can directly inline constants
declared in dynamic code. The system is no longer hermetically sealed
off from the rest of the codebase.
I hope these extensions will make writing shaders frictionless enough to
make me want to write more of them. I have the largest part of the code
generator done, and am finishing up on the top level declarations,
namely uniforms and uniform blocks. My first test case is the shader
program I use in my 2D remeshing prototype, which is a fairly complex
shader with three stages (the code is here
https://gist.github.com/paniq/e5b5ebd4797065f33a57)
Once that works (should take this week at most), all hard language work
is done and I can return to continue work on the prototype.
16.7.2015
More features for Conspire!

Removed the Conspire 1.0 language from the loader; It's no longer
supported.
The rmlc- prefix has been changed to static-. The RMLC DSL is
now simply called the static DSL.
assert is now a macro; The second argument is only evaluated when
the assertion fails. There's also an assert-except for unit
testing against code that is expected to raise an error.
using now also works in the static DSL.
There is now initial support for transparent struct-of-array types,
inspired by Jonathan Blow's SOA feature in Jai, via the
column-array array type constructor.
arithmetic with GLSL-style vector types only worked in the static
DSL, now also works in dynamic code.
vector types now support GLSL-style compound constructors in the
static DSL, and can be initialized from arrays and SSE vector types.
The symmetrical GLSL-style matrix types are now implemented.
added Python-style is equality testing
dprint is like print, but outputting the module name in which it
was called.
Because the dot notation makes member access convenient, built-in
Lua modules are now in their own namespaces.
Added =# operator comparable to dict.setdefault in Python
help and repr allow introspection into tables and types in form
of formatted symbolic expressions.
Better support for handling Lua VM varargs: The ... argument is
now passed unfiltered and argument unpacking is now a special form
(__unpack). Lua's select() function is available as __select,
allowing to enumerate varargs.
The $ operator now serves both as table constructor and namespace
for table operations. It's a little jQuery-like.
Keeping naming Schemes more familiar for programmers of other
languages: length-of got renamed to a Pythonesque len. Likewise,
array-of, offset-of, vector-of and size-of are now
arrayof, offsetof, vectorof and sizeof.
To make pointer arithmetic more readable to people coming from
C/C++, & and * operators will expand to ^& and ^* if used as
unary operators.

10.7.2015
What... what happened? I spent all week writing all sorts of
improvements and conveniences for the language and forgot the time
because it's too much fun. Let's see.

Updated Terra to latest build and LLVM 3.5. A few fixes were needed
to deal with changes in the interface.
Ported the gl, vecmath and pool modules to the new conspire.
Added try/except syntax sugar for xpcall.
Added GLSL-style vector types with support for permutated accessors,
arithmetic operators and SIMD code generation. It's really quite
convenient to work with, but the matrix classes aren't completely
done yet.
& and * can be used as symbol prefixes, as shortcuts for (^&)
and (^*). Makes de-/referencing more C-like.
alias-cast syntax sugar, similar to C++'s reinterpret_cast
a very generic using mechanism that allows to hook scoped lookups
for variables, inspired by Jonathan Blow's extension to C++'s
using in his Jai language. I believe my implementation is generic
enough that it allows extending the language with new lookup
strategies from within.

3.7.2015
The dynamic GLSL front-end was fun until I ran into a tough problem: how
to import the function declarations contained in existing GLSL shaders
(and I have a lot of those). So I'll stick with the previous GLSL shader
generator for now.
Therefore: the yak is shaved! I ported the trimesh prototype I talked
about on 10.6.2015 to conspire2, everything is ready for a thorough
refactoring now, and work on the main game can continue.
What are the features of conspire2 so far?

Scriptable compiler, can optionally be deployed to users.
Runs on top of Terra, so provides all the same runtime services.
Dynamic code runs on LuaJIT runtime.
Statically typed code is compiled and optimized by LLVM.
Native support for importing C libraries and Lua modules.
Can export standalone executables.
Memory management is explicit for static code, garbage collected for
dynamic code.
Syntax inspired by Scheme.
AST uses S-expressions exclusively.
AST is designed to be symmetric: code can be read, automatically
altered and saved back to disk.
Interpreter can be extended with new keywords and language
constructs from within the language, up to completely new domain
specific languages.
Macros can be hygienic or non-hygienic.
Dynamic and static functions use the same syntactic building blocks.
Semantics inspired by C, Python, Javascript.
Support for infix operators in S-expressions.
Support for symbolic prefixes (0x, 0b, etc.)
All sorts of nice syntax forms to make life easier.

27.6.2015
So, my branching problem got fixed with an alternative solution
suggested on the mailing list
(https://mailman.stanford.edu/pipermail/terralang/2015-June/000441.html
). I'm happy to report that the RMLC (runtime machine language compiler)
works without much further changes, and is completely expression based.
Writing some fast assembly now got super convenient.
I also added a bunch of macros for struct declarations and dynamic class
definitions, based on the 30log library.
I'm currently sitting on a GLSL compiler with a syntax analog to the
RMLC compiler, and have just completed a markup tree -> C-like code
generator implementation to that end. The markup tree will be generated
from a GLSL AST which in turn is constructed from a semantic parser.
This way it will be possible to mix dynamic functions, static functions
and shader functions in the same context, which all use more or less the
same syntax. Different shader main functions can use the same dynamic
declarations yet generate completely different material shaders from
them. The GLSL system will then also be able to use Conspire's module
import system, convert struct declarations into uniform structs and
other niceties. I hope writing shaders will then no longer be a total
PITA.
22.6.2015
Turns out the bug report was necessary, I submitted it here:
terralang/terra#99
As long as this isn't fixed, I'm not quite sure how to proceed with
branching support. I can't code the ternary conditional as a regular
if-block because those are statements; I would have to first declare a
variable, then set this variable in the then or the else block, but
the type of the variable has to be negotiated between what the returning
expression expects and what the two branches contain. Also, what to do
with situations where the final result is thrown away? void can be
passed as a value for these situations, but it would help to know
beforehand that these expressions only matter as statements.
It looks like I'll have to revert the RMLC code generator to the same
putdest structure I use for the Lua translator. Sad, because I thought
terra's expression quoting system was flexible enough. Great, because
this gives me more control. If generators know what an expressions
destination will be (return value, variable or discard), and more
importantly, the destinations type, they can annotate temporary
variables correctly and generate different code.
For example, (if cond-expr then-expr else-expr) can produce a ternary
conditional expression when the target is a variable or return value,
enforcing both branches to return types compatible to the target, or a
regular if statement block when the return value is discarded.
Terra and the LLVM compiler both optimize out excessive use of temporary
variables, so generating code in this fashion should be fine, except for
immutable struct types declared on the stack, where temporary values
might evoke unnecessary copy constructors. I'll have to do some testing
first to know what happens here.
21.6.2015
I added the new RMLC DSL (RMLC = runtime machine language compiler),
which is designed to be a full analog of the dynamic language DSL, only
with static typing added. This way, it is easy to switch a static for a
dynamic function or vice versa. Terra makes it easy to write code that
is completely expression-based, thanks to the quote a in b end form,
but I've been running into an evaluation issue with terralib.select,
the ternary conditional operator, where quoted statements get executed
regardless. I'm currently building a minimum reproduce to test with the
latest version and submit to the Terra project if necessary.
19.6.2015
Before I could start re-adding the LLVM DSL I had to re-implement a
bunch of syntax forms to make the subsequent task a bit more
comfortable. In the process a few more special forms also got added. I
added (backquote) (supporting (unquote) and (unquote-splice),
which makes writing syntax rules easier), (cond) (Scheme's variant of
if-elseif-else blocks), (import-from) (which automates the task of
declaring local variables and initializing them with values read from a
table, sort of the counterpiece of (locals)), (var-unpack) which
imports variables from arrays, and (foreach), with support for
auto-unpacking iterator results.
18.6.2015
The first thing I did today was write an infix parser that rewrites
expressions like (func 1 + 2 * 3 arg2 4 + 5 arg3) as
(func (+ 1 (* 2 3)) arg2 (+ 4 5) arg3). While the most popular
operators are already bootstrapped, code can register new infix
operators that will be translated to regular s-expressions. Infix
operators are always treated as binary. Unary prefix and postfix
operators are not supported.
Unlike in Lua or C, operators and arguments need to be properly spaced
apart because the s-expression parser doesn't know anything about
operators, so e.g. a+4 is read as a single token, a + 4 is read as
three tokens.
After running into some semantic difficulties (it is simply easier for
syntax rewriters to assume that an expression list guarantees one
expression per token), I decided that infix operators always need to be
declared in dedicated parentheses; So (func 1 + 2 3 + 4 + 5) must be
written as (func (1 + 2) (3 + 4 + 5)).
Variables are now running on the stack again, and (locals) generates
an ad-hoc table. cond and backquote have been ported.
The next step is to re-add the LLVM / terra DSL. I'm thinking of a style
that is transparent to the main language, with types added e.g.
(var name 0) becomes (var name : int 0),
(function name (arg1 arg2) expr) becomes
(function name (arg1 : int arg2 : int) : int expr) or
(function name (arg1 arg2) : (int <- int int) expr).
17.6.2015
The past few days I continued work on the new Conspire interpreter,
which is coming along nicely. The special forms expanded to 12, a number
that is continuously in flux. I'm still in the process of bootstrapping
all language features in the core header, and today I managed to move
syntax expansion to its own pre-pass before compiling.
There's one central language feature I'm having trouble implementing,
which is hygienic macros for syntax-rules. I tried parsing and rewriting
the templates, but it's difficult doing this without replicating the
syntax expander.
The other feature I'll need to change is the way scope variables are
captured; I'll move this back to the Lua stack, mapping generated
symbols to the actual names to support free naming. Currently, the
values are directly written and read from lua tables, which complicates
the code and may cause problems with garbage collection; I'll implement
the feature I did this for (locals) differently; (locals) will simply
expand to a table mapping all variables captured in the current scope.
14.6.2015
I took a week off from Twitter and that seems to have helped with the
productivity. I forced myself to look at the code I have to refactor and
realized that the Scheme-inspired language that I developed for game
programming needed a few fixes in its semantics so it's fun again.
The first version was very closely built along the principles of Lisp
and Scheme, and I made some trade-offs to be able to translate code to
Lua, and it turns out a) I don't like (let) and (letrec) much, because
they make variable declaration in functions semantically different from
variable declaration in modules b) piggybacking variables on Lua's stack
(with the drawback of not being able to use the full Lisp character set
for variables) for speed wasn't really needed, because I write nearly
all game code in the terra DSL, and so I can make scopes a little more
Pythonic instead (supporting easier injection and extraction of
variables) and c) The terra DSL part has become the heart of the
implementation but runs on C-like semantics that are sufficiently
different from the Scheme one that switching between both languages
becomes jarring, so they need to meet in the middle.
I wrote a list of changes I want to do in the implementation (it's at
https://gist.github.com/paniq/34525b55cd4ae8744961), then made a new
translation unit to fit the specs. I'm very happy with how it turned
out, it only needs 8 special forms, 6 constants and 5 builtin functions
to bootstrap the rest of the language.
Conspire supports multiple translation units. The previous revision
still works without any changes, so I don't have to port anything right
away, and because transforming the AST is simple, it's going to be easy
to translate some of the code I already have.
That should make it a lot less annoying to refactor the triangle mesh
implementation, and I can think about more new language feature like
support for SOA (struct of array).
10.6.2015
It appears that many of my problems stem from the inability to
rubberduck them with a colleague; When I was still working at a game
studio, I could meet my colleagues in the hallway and bounce my ideas
off them - someone I'd deem competent who would listen to my stream of
thoughts and give it a seal of approval, giving me enough confidence to
go down that road -- or enlighten me with an alternative approach that
leads to the same end.
Nowadays, I mostly work alone, and while Twitter has been helpful many
many times in providing a rubberduck surrogate, it appears that often my
problems are too complex to describe in few tweets in such a way that
they invite others for collaboration or judgement. When everyone who is
competent believes that I already know what I'm doing but does not tell
me so, I feel like I'm on the wrong path. I get the sense that my
friends have their doubts but keep them to themselves, and I
procrastinate, sitting in a mental waiting room, waiting for my next
shot of enthusiasm.
Therefore, in the hopes of catalyzing thought and confidence, I decided
to start a devlog within a gist, which I will attempt to update each
day. The devlog is to be seen as a precursor to a possible blog article;
thoughts are less ordered and focused, documented approaches can be
completely pointless in the end.
It is possible that some of the things written here might ultimately
turn into a blog post; I usually begin planning one when enough people
express their interest for a topic.
So, with that out of the way, what's going on? I'm currently in the
process of writing a 2D prototype of the NOWHERE terrain and physics
system as I imagine it to work; following the ideas of this paper,
http://matthias-mueller-fischer.ch/publications/airMeshesPreprint.pdf
the world will be completely embedded within a huge square 2D triangle
mesh, tesselating both air and solid materials. The mesh will host
collision detection, rigid and softbody physics, most likely using new
position based dynamics techniques as documented in
http://www.researchgate.net/profile/Jan_Bender/publication/274940214_Position-Based_Simulation_Methods_in_Computer_Graphics/links/552cc4a40cf29b22c9c466df.pdf
and I also want to experiment with 2D light propagation and wind current
simulation later on.
What I have so far is a 2D mesh class with support for generating the
beginning quad that bounds the simulation space, splitting faces,
splitting edges, flipping edges and an optimization pass that flips all
edges where it improves the quality of the mesh.
There's also a testing and editing front-end that renders the mesh
straight from a buffer texture using a GL geometry shader, allowing me
to dynamically split faces and edges, flip edges and initiate an
optimization pass -- all this happens on the CPU right now and will
continue to do so until a move to GPU is really necessary.
What is missing right now is the physics and collision system and the
associated auto-mesh-optimization step. For this to happen, I believe I
need a double-buffering scheme for the mesh. Currently, vertices and
faces are stored in two separate pools, and I suppose those would then
become four pools. Each pool would store the individual attributes in
separate arrays (struct-of-array), so buffers could be flipped
array-wise, not pool-wise; this also allows me to upload only the
absolutely relevant data to the GPU.
The methods for the mesh would all have to support reading from a front-
and writing to a backbuffer target which can be freely set.
For the opimization via edge flips, it appears I'll have to operate on
the backbuffer completely, as operations update opposite half-edges (I
call them "twins"), and the operation can't be parallelized. It probably
can, but I can't think of or seem to find an efficient way to do this.
So it will have to do for now.
When the double-buffering is implemented, I can probably start writing a
primitive physics loop with some colliding bodies.