Skip to content

Instantly share code, notes, and snippets.

@andrewarchi andrewarchi/

Last active Jun 14, 2020
What would you like to do?
An accumulation of notes and references gathered from my reading

Bit Log

This is a sequence of notes on software and its theory. Frequent topics include programming languages, compiler design, algorithms, and optimization.


Concurnas programming language

Concurnas is an open source JVM programming language designed for building reliable, scalable, high performance concurrent, distributed and parallel systems


The 10,000,000,000th prime number


Quines (self-replicating programs)

Whitespace has no string literals or meta programming features, so a Whitespace quine would probably need to implement an interpreter for all the utilized instructions. The quines on the Whitespace website are very long and I wonder what techniques are used in those.

Quines always print the same output, so they could be reduced by the compiler to a single print statement - a fantastic test of a partial evaluation implementation.


Windows Terminal v1 release

Windows Terminal released v1.0.1401.0, the first stable version. It ships with Cascadia Code, a new monospaced typeface with programming ligatures.


Specializing string operations that only use length

r := 0
for j := 0; j < 20; j++ {
  for i := 0; i < 1000000; i++ {
    r += len(strconv.Itoa(i))

The inner loop is independent of the outer loop's index, so the loops can be interchanged and the length multiplied by 20 before adding. In fact, the j loop can be removed entirely and the sum simply multiplied by 20 when done.

More interestingly would be optimizations that would specialize Itoa to produce the length directly rather than allocating any strings. Append, concat, slice, byte assign, and string assign are can be simply be mapped to operations on the length. Rune assign (unicode code point) is not so simple as the rune replaced is not necessarily the same length as the rune assigned. Additionally, this would not work for cases that are conditional on a value in a string.

var s, s2 string
var i, j int // indices
var c byte   // character
var l int    // length of s

s[i] = c   // does not change l
s += s2    // l += len(s2)
s = s[i:j] // l = j - i

sr := []rune(s)
sr[i] = r      // rune widths may vary, so
s = string(sr) // finding l is not trivial
  • Inline calls to FormatInt, small, and formatBits and specialize on constant parameters. This eliminates branches for bases other than 10.
  • Replace cast to uint with cast to uint64. The argument to Itoa is int rather than int64 as in FormatInt, so there is no longer any need to increase the bit width.
  • Reduce operations on a string to operations on its length to remove all string allocations.

The fast paths for small integers and 32-bit architectures are now no longer necessary, but those paths could only be safely removed if the optimizer was paired with an SMT solver to prove that it is equivalent without those paths. Probably not worth the effort.

const fastSmalls = true
const host32bit = ^uint(0)>>32 == 0

func ItoaLen(i int) int {
  if fastSmalls && 0 <= i && i < 100 {
    if i < 10 {
      return 1
    return 2

  u := uint(i)
  l := 1
  if i < 0 {
    u = -u

  if host32bit {
    for u >= 1e9 {
      u /= 1e9
      l += 9

  for u >= 100 {
    u /= 100
    l += 2

  if u >= 10 {
  return l

func main() {
  r := 0
  for i := 0; i < 1000000; i++ {
    r += ItoaLen(i)
  fmt.Println(20 * r)

Of course, ItoaLen could have been implemented using the ceiling of an integer log10, but specializing string operations when only length is needed is more general. Plus it's more fun.


Signal and GIPHY privacy

Facebook is to be purchasing GIPHY for $400 million which will more tightly couple the two platforms and create more privacy concerns for users of GIPHY.

Signal supports GIPHY via a proxy. A client connects to the Signal service which forwards the request to GIPHY. Signal can't see the contents of the request and GIPHY can't see the originator of the request. Additionally, the requests are blocked into ranges to add padding and hide information from the Signal service.


Quarantine side projects

Ask HN: What's your quarantine side project?

Official Mongo Go driver

Kali Linux

Kali can disguise itself as Windows and Powershell is open source, so can be installed:

Kali on Android (applicable for Kali in general):

Set Up a Headless Raspberry Pi Hacking Platform Running Kali Linux:

Change your Kali default SSH keys:


What's coming in Go 1.15


A hands-on introduction to static code analysis

Rich - terminal formatting in Python

Tmux bindings and configuration


C++ explicit keyword

apt-cyg Cygwin package manager

Awk in 20 minutes

Why you hate Comic Sans


SSH tips and tricks

Enigma - Erlang VM implementation in Rust

Sparse set

Fork of Google Code Search

Rob Pike interview on Go

IOCCC submissions


Malware spread through Google Play

Restoring decade-old Picroma Plasma editor without patching it

Loop widening


A Graduate Course in Applied Cryptography


Strength reduction


Exact bitwidth integer types with _ExtInt in Clang


SQL vs GraphQL

Embedding binary objects in C


GitHub command line tool

GitHub is now free for teams

Charles Moore: From Forth to Stack Processors and Beyond


Ternary computer emulator

Decompiled Stuxnet shown in Star Trek

Unik - Go unikernel!topic/golang-nuts/4cDIL5Vr_es


Why OPSEC Is for Everyone

Moving away from Gmail


Impressions of Rust as a Swift Developer: Memory Management


Fast float64 string parser

Daniel Lemire and Michael Eisel developed a parser in C for float64 strings that is both fast and accurate. A Hacker News commenter notes that this approach likely has poor cache performance because it makes heavy usage of large lookup tables. This would be a fun exercise to translate to another language with the lookup tables generated programmatically at compile-time.


Design patterns

Peter Norvig details design patterns in dynamic languages that are invisible to the developer, informal, and formal.

Pattern: Partial Evaluation

Intent: Write literate code, compile to efficient code


define function eval-polynomial(x, coefs)
  let sum = 0;
  for (i from 0, c in coefs)
    sum := sum + c * x ^ i;

such that eval-polynomial(x, #[1, 2, 3]) compiles to 0 + 1 + 2 * x + 3 * x * x or better yet 1 + x * (2 + 3 * x)


Pattern: Rule-Based Translator

Intent: For each pattern detected in input, apply a translation rule

Special case of Interpreter or Compiler


define rule-based-translator simplify ()
  (x + 0) => x;
  (x * 1) => x;
  (x + x) => 2 * x;
  (x - x) => 0;

Pattern: Relation

Intent: Represent that x is related to y by R

Motivation: Often, this is done by making a R slot in the class of x and filling it with y.


  • May be no common superclass for x's
  • y may take less than a word (say, 1 bit)
  • Don't want to waste space if most y's are void
  • Don't want to page if cycling over R's

Solution: Consider a range of implementations, from slot to bit vector to table to data base. Provide a common interface to the implementations.


Deno - JavaScript runtime

Deno is a runtime for JavaScript developed by Ryan Dahl, the creator of Node.js. It is built with Rust, runs on V8, and supports TypeScript out of the box.


Git switch command

Git 2.23 added the command git switch to simplify some checkout use cases.


Postfix regexp dialect

In the challenge "Complement of a Regex" on Code Golf Stack Exchange, a simple postfix regular expression dialect was proposed.

For the purposes of this challenge, we define a postfix regex dialect with alphabet {1, 0} and the following operations:

  • 1 and 0 match themselves literally.
  • _ matches the empty string.
  • ! always fails (i.e. it does not match anything).
  • ab; matches a, followed by b.
  • ab| matches both a and b.
  • a+ matches one or more instances of a.

The regex should always start matching from the start of the string, and stop matching at the end of a string (in other word, it is implicitly wrapped in ^$).


Go suffixarray

Ian Lance Taylor proposes removing the seldom used index/suffixarray package when modules support works with the standard library.


Epigram programming language

Epigram is a dependently typed language that influenced Idris and Coq.

Hypothetical introductory compilers course

2020-02-16 to be open-sourced

The upcoming replacement to,, will have its package documentation feature open-sourced and more licenses supported. Quite a bit of controversy was incited on golang-dev upon the initial announcement due to it being closed source. Russ Cox announced on golang-dev:

  • The next rev of the site will display docs for CC0- and Unlicense-licensed modules. That will happen ideally next week. More generally, we continue to work down the list of most common licenses in the Go ecosystem and accommodate what we can. At some point I hope to have a blog post talking about why licensing is important generally and what we can and can't use, but for now just please understand that we are doing what we can to make work as well as possible for as much of the Go ecosystem as possible. We're not done yet.

  • I can now confirm that we will open-source the source code. There's more to doing that than just a "git push", and again I'm out next week, so it will still be at least a few weeks before the code goes out. But rest assured that it is coming and please accept my apologies for us not doing all the work to make that happen before itself launched. That's on me, not the team.!topic/golang-dev/mfiPCtJ1BGU


Simple bitset membership


contains(1<<_Break|1<<_Continue|1<<_Fallthrough|1<<_Return, tok)

func contains(tokset uint64, tok token) bool {
	return tokset&(1<<tok) != 0

Flags in Go tests



llgo removed from LLVM repo


gollvm is a replacement for llgo:


Getting started with Forth



Zig lang


Zig is a system programming language intended to be an alternative to C. It provides high level features such as generics, compile time function execution, and partial evaluation, while exposing low level LLVM IR features such as aliases and intrinsics. Zig uses Clang to provide automatic import of .h symbols, including inline functions and simple macros. Zig uses LLD combined with lazily building compiler-rt to provide out-of-the-box cross-compiling for all supported targets.


Xoc, an Extension-Oriented Compiler by Russ Cox


Posit representation of real numbers


Go nil errors


nil can be a case in an interface's type switch!

Go semilattice with respect to _ on RHS


Forth Wikipedia


Modern Forth compilers perform "peephole optimization". Peephole optimization is typically performed by pattern matching by modern compilers.

Languages similar to Forth:

My reactions to Go generics/contracts proposal


  • More powerful than interfaces (can express relationships between types, i.e. graph/node references)

  • No boxing is good

  • Type arguments can be inferred with unification when each type param is used as a func param:

    This can only be done when all the function‘s type parameters are used for the types of the function’s (non-type) input parameters. If there are some type parameters that are used only for the function‘s result parameter types, or only in the body of the function, then it is not possible to infer the type arguments for the function, since there is no value from which to infer the types.

  • Syntax gripes:

    • Using parens for type params reduces clarity. Consider the following examples:

      func Slice(type From, To)(s []From, f func(From) To) { ... }
      func Slice(from, to Typ) (s []From, f func(From) To) { ... }

      When there's type params and no retval list, then it becomes harder to comprehend. On the other hand, the following is easier to differentiate at first glance:

      func Slice<type From, To>(s []From, f func(From) To) { ... }
      func Slice(from, to Typ) (s []From, f func(From) To) { ... }


      Why not use the syntax F like C++ and Java? When parsing code within a function, such as v := F, at the point of seeing the < it's ambiguous whether we are seeing a type instantiation or an expression using the < operator. Resolving that requires effectively unbounded lookahead. In general we strive to keep the Go parser simple.

  • Type assertions work!

    contract reader(T) {
    	T Read([]byte) (int, error)
    func ReadByte(type T reader)(r T) (byte, error) {
    	if br, ok := r.(io.ByteReader); ok {
    		return br.ReadByte()
    	var b [1]byte
    	_, err := r.Read(b[:])
    	return b[0], err
  • There's an example of a closable channel for iteration (or something). It uses runtime.SetFinalizer which I have never heard of.

  • The proposal does not allow for specialization (i.e. with type switches to more specific contracts), as seen in the "Absolute difference" example.

Go 1.15 constant indices of slicing is constant (abandoned)

(stub) Were the proposal to be implemented,

const (
  a = "abc"[iota]

would become valid.

Go 1.14 maphash


  • Add a [0]func()-typed field to the Hash so that Hashes cannot be compared. (I considered doing the same for Seed but comparing seeds seems OK.)

Are the write functions guaranteed by the Go 1 promise to not fail?

It always writes all of b and never fails; the count and error result are for implementing io.Writer.


VSCode added support for semantic highlighting


Go 1.14


Experience report on a large Python-to-Go translation



Spam emails


What would a programming language designed from the ground-up for a multi-core world look like?


ParaSail is Tucker Taft’s attempt at a parallel programming language. Statements and expressions are evaluated in parallel unless there are data dependencies between them.

Wow. I’ve only read the homepage, but it already is hitting all the points I was thinking about! Particularly this bit:

Every ParaSail expression is defined to have parallel evaluation semantics. That is, given a ParaSail expression like F(X) + G(Y), the language rules ensure that it is safe to evaluate F(X) and G(Y) in parallel. The compiler makes the decision based on complexity or other criteria whether a given computation should be created as a potentially parallel activity. An underlying scheduler then maps these potentially parallel activities to particular processing resources, by default using a work-stealing approach, which provides load balancing across processors while also providing good locality of reference and minimal cache contention.

Edit: From the reference manual:

Parallelism should be built into the language to the extent that it is more natural to write parallel code than to write explicitly sequential code, and that the resulting programs can easily take advantage of as many cores as are available on the host computer.

Edit 2: here’s another goodie

parallel activities are identified automatically by the compiler, and language rules prevent data races between readers and writers of the same object, while explicitly concurrent objects can be used for safe synchronization when concurrent access from multiple readers and writers is required, without any need for explicit lock/unlock or signal/wait.

ParaSail really looks like exactly what I was thinking about.


A high-level, abstract perspective: If you write expressions as a dataflow graph, it is trivial to figure out which operations can be computed in parallel. The complicated factor is that parallelism induces some overhead. The million-dollar question becomes: When is it worthwhile to actually spin up a separate thread? If you want to compute (a + b) + (c + d), you don’t want to compute a + b and c + d in separate threads. Unless, maybe, a, b, c, and d represent numbers with a gazillion digits.

If you program in a purely functional language, the dataflow graph is easy to obtain. For imperative languages, the relation is a bit more obscure (consider a for-loop, for example).


LLVM Brainfuck example




Formatting a floppy


diskutil eraseDisk "MS-DOS FAT12" FLIPPY /dev/disk2


C++ lower_bound




Code navigation alternate commands VS Code used to do nothing when selecting Go to Definition while already at the definition. With this release, alternate commands can be executed instead. For instance, Go to Definition can be mapped to run Go to References.

This is customized via the editor.gotoLocation.alternativeDefinitionCommand setting. For example, you can make the alternate for Go to Definition be Go to Declaration and vice versa. You can leave the setting empty if you prefer the old behavior.

Wasm - s-expressions and linear form Java - decompilation of class files before opening Hexdump


Rust stack usage analysis


Jorge Aparicio writes about the process of building a static stack usage analyzer for Rust using LLVM IR. The style of his blog, Embedded in Rust, mimics Markdown with its headers and is a pleasing, simple design.




Robert Lee's fast implementations of k-opt for TSP use quadtrees.

Computed goto for efficient dispatch tables


Tech Interview Handbook


Static types and bug density


You want to reduce bugs? Use TDD. You want useful code intelligence tools? Use static types.

Rust profile-guided optimization


Michael Woerister, on the llvm-dev mailing list, fed back that as far as he's aware, profile-guided optimization now works exactly as expected with Rust.


AI-based infinite text adventures


Go cross-platform epoll

Epoller is a Go epoll interface for connections in Linux, MacOS and Windows.

Poller simplifies the cumbersome C API:

type Poller interface {
  Add(conn net.Conn) error
  Remove(conn net.Conn) error
  Wait(count int) ([]net.Conn, error)
  WaitWithBuffer() ([]net.Conn, error)
  WaitChan(count int) <-chan []net.Conn
  Close() error




  • µKanren - implementation miniKanren in 40 lines of Scheme "the essence of miniKanren"
  • miniKanren - language


Pointfree Haskell


Spotify statistics



Brodal queues


Efficient priority queue. Chris Okasaki was involved (the guy from functional red/black trees).

  • find-min: Θ(1)
  • delete-min: O(log n)
  • insert: Θ(1)
  • decrease-key: Θ(1)
  • meld: Θ(1)




DeSmuME Lua scripting:




Did not work for me:





100 blocks a day


String interning





Talk from LLVM Social Berlin

One Weekend Compiler



LLVM Stacker tutorial lang






Replays recorded runs for debugging

Robert O'Callahan

Flang using MLIR and LLVM



MLIR Toy language


MLIR keynote


Because location tracking is integral, we can also build the testsuite to depend on it, and use it to test analysis passes. ... Design for testability is an key part of our design, and we take it further than LLVM did.

Look into:

Polyhedral Compiler Techniques Widely explored in compiler research:

  • Great success in HPC and image processing kernels
  • Tensor abstraction gives full control over memory layout


Our implementation is well underway, and we’ve built a number of passes using this representation. That said, we’re still actively evolving and changing things here - this would be a great place to get involved if you are interested in contributing.

OpenMP & Other Parallelism Dialects

OpenMP is mostly orthogonal to host language: ● Common runtime + semantics ● Rich model, many optimizations are possible Model OpenMP as a dialect in MLIR: ● Share across Clang and Fortran ● Region abstraction makes optimizations easy ○ Simple SSA intra-procedural optimizations

Sub communities within Clang would also benefit. For example, OpenMP is not very well served with the current design. Very simple optimizations are difficult on LLVM IR, because function outlining has already been performed. This turns even trivial optimizations (like constant folding into parallel for loops) into interprocedural problems. Having a first class region representation makes these things trivial, based on standard SSA techniques. This would also provide a path for Fortran to reuse the same code. Right now the Flang community either has to generate Clang ASTs to get reuse (egads!) or generate LLVM IR directly and reimplement OpenMP lowering.


“Building a Compiler with MLIR” Tutorial

  • Build a new frontend/AST with MLIR, show lowering to LLVM IR
  • Introduce mid-level array IR: use it to optimize, tile, and emit efficient code
  • Tomorrow @ 12:00




Pi nodes


PiNodes encode statically proven information that may be implicitly assumed in basic blocks dominated by a given pi node. They are conceptually equivalent to the technique introduced in the paper "ABCD: Eliminating Array Bounds Checks on Demand" or the predicate info nodes in LLVM.

ABCD: Eliminating Array Bounds Checks on Demand:


LLVM pi blocks


LLVM arbitrary precision integers


  1. You may find that doing this in MLIR works better. MLIR has an open type system, so you can add a new type for your integers quite easily, though then you won't benefit from any LLVM optimisations. It's not clear that you could necessarily use the generic LLVM backend infrastructure though, so a direct translation from MLIR to your bytecode may preferable.

Go 1.14


Embedding overlapping interfaces:

Constant len and cap are untyped:

Low-cost defers:

Escape analysis rewrite:


Elixir anonymous function shorthand


iex> sum = fn (a, b) -> a + b end
iex> sum.(2, 3)

iex> sum = &(&1 + &2)
iex> sum.(2, 3)

Like Swift?

VSCode ligature stylistic sets and web UI



Stylistic sets:

There is now more fine grained control over the font features. When configuring "editor.fontLigatures": true, VS Code would turn on liga and calt. But some fonts have more settings, such as stylistic sets used by Fira Code.

We now allow these font features to be explicitly controlled, for example:

"editor.fontFamily": "Fira Code",
"editor.fontLigatures": true,
"[javascript]": {
  "editor.fontLigatures": "'ss02', 'ss19'",

Web UI:

There is now a minimal version of VS Code that can run in a browser that is available for development and testing. The Web version is still missing some features and is under active development.

In your local fork of the vscode repository, execute yarn web from the command line and access http://localhost:8080/. For more details about cloning and building the vscode repo, see the setup instructions.




Semmle CodeQL and GitHub

Semmle developed CodeQL, a query language with the vision of code as data. CodeQL can expressively query patterns of code and detect classes of vulnerabilities through variant analysis. It has extensive libraries to perform control and data flow analysis and taint tracking language agnostically.

GitHub recently acquired Semmle and its plans to integrate CodeQL to detect vulnerabilities in GitHub repos. Semmle is also hiring.

{Semmle}{GitHub}{CodeQL}{static analysis}{security}{taint analysis}


Languages using LLVM


Diagram on LLVM IRs:


LLVM has only one expansion and it is wrong/misleading. Solution: have lots of ambiguous expansions so we can change our mind later :-)


Moore's Law and programming languages

Chris Lattner talks of a new era of compilers coming with the end of Moore's Law. Chris is currently developing MLIR for high performance machine learning which is unlike earlier IRs. I hope to contribute to this changing landscape.

Increased diversity in compilers and tools is a huge opportunity - this area is exploding in importance as we stand here, gazing over the precipice at the end of Moore's Law. At the same time there is so much untapped talent, and we *need* fresh perspectives and new ideas!

{Moore's Law}{compiler}

Lisp Flavored Erlang

Lisp Flavored Erlang (LFE) is a lisp dialect designed on BEAM (Erlang VM) that combines metaprogramming and Erlang's reliable concurrency.


Alternate OCaml front-ends

The OCaml compiler is written modularly and can accept an AST from an external front-end. For example, the m17n project is a front-end that introduces Unicode identifiers to enable multilingualization. m17n carefully handles special cases as none of the normalization forms (see entry from 2019-06-04) are closed under concatenation and file systems treat Unicode differently.


Reason programming language

Reason, also known as ReasonML, is a syntax extension and toolchain for OCaml created by Jordan Walke at Facebook. Reason offers a syntax familiar to JavaScript programmers, and transpiles to OCaml. Statically typed Reason (or OCaml) code may be compiled to dynamically typed JavaScript using the BuckleScript compiler.

Reason generalizes and cleans up the syntax of OCaml. Parenthesis are consistently used and the syntax is designed to be more approachable to JavaScript developers. Automated tooling can seamlessly convert codebases between OCaml and Reason syntaxes.

As Reason is developed by Facebook, it has excellent interoperability with React by using its ReasonReact bindings.

Reason and Elm both compile to JavaScript, so it seems natural to compare the two. Reason is developed on the mature OCaml, so benefits from a large ecosystem and fast compilation. Reason also is highly interoperable with existing JavaScript libraries where Elm instead makes this more difficult in favor of pure Elm code. Elm uses Haskell-like syntax.

In a Hacker News post on Reason, Jordan Walke notes that it may be possible to remove the need for semicolon as a delimiter and reformat existing code using refmt. The F# spec documents its semicolon elision and indentation, so may be useful to reference for such a change. Users analogize Reason with Elixir, LFE, Clojure, and TypeScript as they are all languages developed on existing compilers, VMs, or languages.


Google search engine prototype anatomy

Sergey Brin and Lawrence Page, in "The Anatomy of a Large-Scale Hypertextual Web Search Engine", detail the design of an early prototype of the Google search engine and its advantages over contemporary search engines. They describe their PageRank algorithm for ranking pages by links and their scalable system for indexing.

Lists of other CS papers accessible to undergrads:


Garbage collection algorithms visualized


{garbage collection}


Rockstar programming language

Rockstar is a programming language with programs resembling song lyrics. It was designed by Dylan Beattie in response to the overuse of the "rockstar developer" phrase used by recruiters.

Mainly because if we make Rockstar a real (and completely pointless) programming language, then recruiters and hiring managers won't be able to talk about 'rockstar developers' any more.

There are two styles of Rockstar programs. Idiomatic Rockstar programs are written as plausible song lyrics (such as the fizz-buzz program below) and minimalist programs omit more expressive forms in favor of clarity.

Midnight takes your heart and your soul
While your heart is as high as your soul
Put your heart without your soul into your heart

Give back your heart

Desire is a lovestruck ladykiller
My world is nothing
Fire is ice
Hate is water
Until my world is Desire,
Build my world up
If Midnight taking my world, Fire is nothing and Midnight taking my world, Hate is nothing
Shout "FizzBuzz!"
Take it to the top

If Midnight taking my world, Fire is nothing
Shout "Fizz!"
Take it to the top

If Midnight taking my world, Hate is nothing
Say "Buzz!"
Take it to the top

Whisper my world

On its Hacker News posting, users speculate names for similar languages:

Maybe we should create more languages called Agile, Senior, Expert, Lead, Ninja, 1 Mio Dollar, Ivy League, Full Stack, etc

Full Stack, a language where the only data type is a stack.


Single instruction compiler



Pragmas in Go



OpenEmu multiple emulator engine for macOS


Poor performance with macOS out of the box:

Unable to compile for macOS:


Macros in Racket


Actor model compared to CSP




Set macOS default text editor


CS student falsehoods



Prolog guide

David Matuszek gives an overview of Prolog:

Prolog variables are similar to "unknowns" in algebra: Prolog tries to find values for the variables such that the entire clause can succeed. Once a value has been chosen for a variable, it cannot be altered by subsequent code; however, if the remainder of the clause cannot be satisfied, Prolog may backtrack and try another value for that variable. ...

Unification can be performed on lists:

[a, b, c] = [Head | Tail].  /* a = Head, [b, c] = Tail. */
[a, b] = [A, B | T].        /* a = A, b = B, [] = Tail. */
[a, B | C] = [X | Y].       /* a = X, [B | C] = Y. */

In most (but not all) Prolog systems, the list notation is syntactic sugar for the '.' functor, with the equivalence: '.'(Head, Tail) = [Head | Tail].

To solve arithmetic would introduce significant complexity to the language implementation, but would be very valuable. It would be interesting to see if research has been done in this area. It may be feasible to integrate SMT solving logic into Prolog evaluation.

Arithmetic is performed only upon request. For example, 2+2=4 will fail, because 4 is a number but 2+2 is a structure with functor '+'. Prolog cannot work arithmetic backwards; the following definition of square root ought to work when called with sqrt(25, R), but it doesn't.

sqrt(X, Y) :- X is Y * Y.  /* Requires that Y be instantiated. */

Arithmetic is procedural because Prolog isn't smart enough to solve equations, even simple ones. This is a research area.


Solving Sudoku with Prolog

Markus Triska demonstrates solving Sudoku using constraint logic programming in Prolog. In this way, the solution is much more elegant than in a general purpose imperative language.

sudoku(Rows) :-
        length(Rows, 9),
        maplist(same_length(Rows), Rows),
        append(Rows, Vs), Vs ins 1..9,
        maplist(all_distinct, Rows),
        transpose(Rows, Columns),
        maplist(all_distinct, Columns),
        Rows = [As,Bs,Cs,Ds,Es,Fs,Gs,Hs,Is],
        blocks(As, Bs, Cs),
        blocks(Ds, Es, Fs),
        blocks(Gs, Hs, Is).

blocks([], [], []).
blocks([N1,N2,N3|Ns1], [N4,N5,N6|Ns2], [N7,N8,N9|Ns3]) :-
        blocks(Ns1, Ns2, Ns3).

Prolog is extremely well-suited for solving combinatorial tasks like Sudoku puzzles, and also for tough practical challenges such such as timetabling, scheduling and allocation tasks on an industrial scale.

The key feature that makes Prolog so efficient and frequently used for such tasks is constraint propagation, provided via libraries or as a built-in feature in many Prolog system. Fast and efficient constraint propagation is often an important reason for buying a commercial Prolog system.

In this example, I am using CLP(FD/ℤ), constraint logic programming over finite domains/integers, the amalgamation of constraint programming (CP) and logic programming (LP), which blend especially seamlessly.

{Prolog}{Sudoku}{constraint logic programming}

Whitespace with Befunge syntax


Project idea: develop Befunge-like language that can compile to Whitespace.

Project idea: develop n-dimensional Befunge-like language.


Programming paradigms




Futamura projections:


Minesweeper Turing-complete

Richard Kaye demonstrates that Minesweeper, with an infinite grid, can simulate computations in his paper "Infinite versions of minesweeper are Turing complete". Knowledge of a square gives partial information about its neighbors and determines possible continuations over the plane.

Detecting semaphore deadlock


I hypothesize that semaphore deadlock can be statically detected by creating a graph with edges for every semaphore wait (and post?). Any cycles indicate potential deadlock.



FALSE esolang



Befunge and Whitespace interoperability

I designed my Nebula compiler for Whitespace, a stack-based language with a secondary heap. Befunge is similarly stack-based, so Nebula's stack analysis and basic blocks could be reused for Befunge.

If Befunge's self-modifying p instruction could be expanded into its possible combinations as I (falsely) posited below in the "Compiling Befunge" entry, then the Nebula IR could be shared between the two languages.

Sharing the same IR would open the way for foreign function interface calls between the languages. Any label in a Whitespace program can be called (or jumped to) as a procedure and a ret instruction transfers control back to the caller while end terminates the program. Befunge, on the other hand, has no formal concept of procedures. Befunge control flow maintains a cell position and direction, so a procedure for the purposes of FFI could be uniquely defined as a position-direction tuple. Whitespace could then transfer control to a position and direction in Befunge and the program would terminate with Befunge's @ end instruction (Befunge has no return construct).

To link mixed programs, additional metadata would need to be stored to connect a Whitespace label to a Befunge position and direction. A simple file format could be defined that defines the mappings. This would enable a Befunge cell to be defined as a call to a Whitespace procedure without needing to expand Befunge syntax.

The Befunge ? random path instruction poses problems when converting programs from Befunge into Whitespace. The only non-deterministic behavior in Whitespace is user input, so any random number generation would need to be seeded from initial user input. Thus pure Whitespace cannot equivalently represent any Befunge program containing the ? instruction. However, for hybrid programs, Whitespace can leverage Befunge's ? instruction to implement pseudorandom number generators that require no seed from the user.


Compiling Befunge

Befunge is a stack-based esolang with a two-dimensional instruction pointer and self-modifying instructions. It was designed by Chris Pressey with the goal of being difficult to compile.

Computationally, any Befunge-93 program can be represented as a push-down automaton, but due to the 80x25 grid size restriction, not all push-down automata can be encoded in Befunge-93 making it not Turing-complete. Befunge-98 removes this size restriction.

Although difficult to compile, a handful of JIT compilers exist:

The Betty compiler, for example, treats every possible straight line of instructions as a subprogram, and if a p instruction alters that subprogram, that subprogram is recompiled. This is an interesting variation on just-in-time compilation, and it results in a much better advantage over an interpreter, since many instructions can be executed in native code without making intervening decisions on the 'direction' register.

The Befunjit and Bejit compilers, similarly to the Betty compiler, split the original code into subprograms which are lazily compiled and executed. They, however, divide the original playfield into "static paths" - code paths which do not contain instructions that conditionally change direction (i.e. |, _ or ?). The "static paths" may span on more cells than the "straight line paths" of Betty, which results in fewer and longer subprograms. Thus, there are fewer context jumps between the compiler and the compiled code and allows more optimisations.

Project idea: design a Befunge compiler that splits the program into static paths (basic blocks). As the program size and number of instructions are fixed, it should be possible to statically enumerate the possible combinations of instruction mutations and construct paths for each. Jump tables can be used to select the desired path at run time. By statically expressing every possible path, more aggressive optimizations and analyses would be enabled and there would be no need for the context switching used by JIT implementations. The compiler's name could be "Befudge".

Edit: I had overlooked that the p instruction may modify cells at non-constant coordinates, thus it is not feasible to statically compute all possible paths in the general case. It only works if all p instructions in a given program use constant coordinates.


Smalltalk and Simula languages



Paul Graham


Blub programming language

Beating the Averages


You can see that machine language is very low level. But, at least as a kind of social convention, high-level languages are often all treated as equivalent. They're not. Technically the term "high-level language" doesn't mean anything very definite. There's no dividing line with machine languages on one side and all the high-level languages on the other. Languages fall along a continuum [4] of abstractness, from the most powerful all the way down to machine languages, which themselves vary in power.


Ordinarily technology changes fast. But programming languages are different: programming languages are not just technology, but what programmers think in. They're half technology and half religion.[6] And so the median language, meaning whatever language the median programmer uses, moves as slow as an iceberg. Garbage collection, introduced by Lisp in about 1960, is now widely considered to be a good thing. Runtime typing, ditto, is growing in popularity. Lexical closures, introduced by Lisp in the early 1970s, are now, just barely, on the radar screen. Macros, introduced by Lisp in the mid 1960s, are still terra incognita.


[4] Note to nerds: or possibly a lattice, narrowing toward the top; it's not the shape that matters here but the idea that there is at least a partial order.



Customizing macOS menu clock format

The clock in the macOS menu bar does not follow the date format set in preferences. The date format can be manually overridden in the terminal, but the menu bar must be restarted for the changes to take effect. However, on my system running macOS Mojave, I couldn't get this to work.

$ defaults read DateFormat
EEE MMM d  H:mm:ss
$ defaults write DateFormat -string 'EEE dd MMM  HH:mm:ss'
$ killall -KILL SystemUIServer


05AB1E esolang



Full employment theorem


The Little Book of Semaphores

Kimball German recommended to me "The Little Book of Semaphores" by Allen B. Downey to expand on material covered in Computer Systems. It is a free textbook on synchronization and concurrency that has many puzzles for the reader to solve.

In the introduction, it suggests that information on atomic operations for specific platforms could be gathered, but dismisses it in favor of the more general approach of synchronization. However, compilers could implement install-time optimization to eliminate the need in some cases for more expensive synchronization techniques.

So how can we write concurrent programs if we don't know which operations are atomic? One possibility is to collect specific information about each operation on each hardware platform. The drawbacks of this approach are obvious.

The most common alternative is to make the conservative assumption that all updates and all writes are not atomic, and to use synchronization constraints to control concurrent access to shared variables.


Quala type qualifiers for LLVM and Clang

Quala is an extension to LLVM and Clang by Adrian Sampson to make type systems pluggable in C and C++. Provided type systems are a taint tracker and a null value dereference checker. Type metadata is recorded in the resultant LLVM IR.

User-customizable type systems make it possible to add optional checks to a language without hacking the compiler. The world is full of great ideas for one-off type systems that help identify specific problems—SQL injection, say—but it's infeasible to expect all of these to be integrated into a language spec or a compiler. Who would want to deal with hundreds of type system extensions they're not really using?


John Regehr

John Regehr is a professor at the University of Utah that researches in "embedded systems, sensor networks, static analysis, real-time systems, [and] operating systems". He teaches courses including advanced compilers, operating systems, and embedded systems. Several of his recent publications involve LLVM and optimizing compilers.

{academia}{University of Utah}

LLVM for grad students

Adrian Sampson provides an intro targeted towards grad students for writing a transformation pass using LLVM.

Alex Bradbury tracks the LLVM development and announcements in his LLVM Weekly newsletter. This makes following changes in LLVM internals easier.


Graphviz Go utilities

Project grvutils by Than McIntosh provides tools for working with Graphviz graph files. Features include lexing, parsing, manipulating, and pruning.


Go LLVM compilers

GoLLVM is a Go compiler written in C++ using the LLVM backend. It is under active development by Google.

llgo is an LLVM-based compiler for Go, written in Go. It leverages go/parser and go/types from the standard library. It has now been incorporated into the LLVM project along with its LLVM bindings for Go.

Ian Lance Taylor lists some of the benefits of these LLVM compilers:

It's always a good idea to have multiple compilers, as it helps ensure that the language is defined by a spec rather than an implementation.

And, yes, the LLVM compiler generates code that is clearly better for some cases, though the compilation process itself is longer.!topic/golang-nuts/Tf0BOTtEpOs


U Combinator

U Combinator is research group at the University of Utah that formerly included Matt Might, Peter Aldous, and Kimball Germane. The lab researches and develops "advanced languages, compilers and tools to improve the performance, parallelism, security and correctness of software".

The name of the group comes from the U combinator function:

In the theory of programming languages, the U combinator, U, is the mathematical function that applies its argument to its argument; that is U(f) = f(f), or equivalently, U = λf.f(f).

Self-application permits the simulation of recursion in the λ-calculus, which means that the U combinator enables universal computation. (The U combinator is actually more primitive than the more well-known fixed-point Y combinator.)

The expression U(U), read U of U, is the smallest non-terminating program, and U of U is also the local short-hand for the University of Utah.

{academia}{University of Utah}{PL}


Optimizing Conway's Game of Life

Michael Abrash describes several approaches to optimizing Conway's Game of Life implementations in chapters 17 and 18 of his book "Graphics Programmer's Black Book". One of the simpler optimizations is to store the state of the eight adjacent cells as a bit pattern along with each cell's own state to enable use of a lookup table for the next generation. Since the majority of cells are dead and remain dead, using a change list eliminates scanning over cells that will never change.

Tetris in Conway's Game of Life


QFTASM is RISC with 11 of its 16 opcodes assigned. Notably, MNZ "move if not zero" and ANT "and-not" are used instead of MEZ "move if zero" and NOT because creating a TRUE signal from an empty signal is difficult with cellular automata.

Code Name Description
0000 MNZ Move if not zero
0001 MLZ Move if less than zero
0010 ADD Addition
0011 SUB Subtraction
0100 AND Bitwise and
0101 OR Bitwise or
0110 XOR Bitwise exclusive or
0111 ANT Bitwise and-not
1000 SL Shift left
1001 SRL Shift right logical
1010 SRA Shift right arithmetic


Java shortcomings



Enforcing code feature requirements


Myopia µ-recursive language

Myopia is a programming language based on µ-recursive functions. Myopia is nearly identical to the μ6 language excepting its more readable syntax and lack of integer shorthands. Programs that do not use the M operator are primitive recursive and guaranteed to halt.

Myopia deals with functions from tuples of natural numbers to natural numbers (N^n -> N). The functions are constructed by composing the following primitives:

  • Z, the zero function.
  • S, the successor function.
  • I[i,k], the family of identity functions.
  • C, the composition operator.
  • P, the primitive recursive operator.
  • M, the minimisation operator.
-- plus(x,y) = x + y
plus : N x N -> N
plus = P(id, C(S, I[2,3]))

mult = P(Z, C(plus, I[2,3], I[3,3]))

Tink easy-to-use cryptographic APIs

Tink is a multi-language set of cryptographic APIs developed at Google that is designed to be easy to use for developers without a cryptography background. Currently, Java and Android, C++, Objective-C, and Go are production ready and Python and JavaScript are in active development.

Building a WebAssembly Compiler


Colin Eberhardt describes the process of developing a compiler that targets WebAssembly, for a language that he designed called chasm.

 (func $add (param f32) (param f32) (result f32)
   get_local 0
   get_local 1
 (export "add" (func 0))

If you just want to experiment with WAT you can use the wat2wasm tool from the WebAssembly Binary Toolkit to compile WAT files into wasm modules.

The above code reveals some interesting details around WebAssembly -

  • WebAssembly is a low-level language, with a small (approx 60) instruction set, where many of the instructions map quite closely to CPU instructions. This makes it easy to compile wasm modules to CPU-specific machine code.

  • It has no built in I/O. There are no instructions for writing to the terminal, screen or network. In order to wasm modules to interact with the outside world they need to do so via their host environment, which in the case of the browser is JavaScript.

  • WebAssembly is a stack machine, in the above example get_local 0 gets the local variable (in this case the function param) at the zeroth index and pushes it onto the stack, as does the subsequent instruction. The f3.add instruction pops two values form the stack, adds them together than pushes the value back on the stack.

  • WebAssembly has just four numeric types, two integer, two floats.

TODO: finish reading


Pi spigot computation



Macros with Elixir

Ashton Wiersdorf gives an overview on macros in Elixir. Macros in Lisp and its successors define transformations on the AST whereas C macros operate at the lexical level.

Ashton recommends Metaprogramming Elixir by Chris McCord and On Lisp by Paul Graham. Graham shows how to make your own DSL and writes Prolog in Lisp using macros to heavy advantage of compile-time optimizations.


GHC runtime optimizations

Andreas Klebinger describes his experience optimizing GHC on the Well-Typed blog.

He introduced loop recognition by SCC and dominator analysis.

Loops are important for two reasons:

  • They are good predictors of runtime behavior.
  • Most execution time is spent in loops.

Combined, this means identifying loops allows for some big wins. Not only can we do a better job at optimizing the code involving them. The code in question will also be responsible for most of the instructions executed making this even better.

Last year I made the native backend "loop aware". In practice this meant GHC would perform strongly connected components (SCC) analysis on the control flow graph.

  • This allowed us to identify blocks and control flow edges which are part of a loop.
  • In turn this means we can optimize loops at the cost of non-looping code for a net performance benefit.
  • However SCC can not determine loop headers, back edges or the nesting level of nested loops which means we miss out on some potential optimizations.

This meant we sometimes ended up placing the loop header in the middle of the loop code. As in code blocks would be laid out in order 2->3->1. This isn't as horrible as it sounds. Loops tend to be repeated many times and it only adds two jumps overhead at worst. But sometimes we bail out of loops early and then the constant overhead matters. We also couldn't optimize for inner loops as SCC is not enough to determine nested loops.

Nevertheless, being aware of loops at all far outweighed the drawbacks of this approach. As a result, this optimization made it into GHC 8.8.

This year I fixed these remaining issues. Based on dominator analysis we can now not only determine if a block is part of a loop. We can also answer what loop it is, how deeply nested that loop is and determine the loop header.

As a consequence we can prioritize the inner most loops for optimizations, and can also estimate the frequency with which all control flow edges in a given function are taken with reasonable accuracy.

Andreas also reduced storage size of integers in interface files by using a variable length encoding that uses the eighth bit of every byte in an integer to denote whether the number continues.

Simple SMT solver for optimizing compiler

Edsko de Vries demonstrates the creation of a simple SMT solver in Haskell for use in an optimizing compiler.

It is able to transform the following

if a == 0 then
  if !(a == 0) && b == 1 then
    write 1
    write 2
  write 3


if a == 0 then
  write 2
  write 3



(stub) To read:

C# functional constructs


Cepheid variable esolang

Project idea: develop an esoteric programming language that would have "Cepheid variables" whose values oscillate predictably as a reference to Cepheid variable stars that pulsate in diameter and temperature with a stable period and amplitude. The mechanics could be like Malbolge, although less sadistic.


OpenRocket model rocket simulator

OpenRocket is a model rocket simulator to test and design rockets before launching.

C++ xvalues, glvalues, and prvalues



Building a compiler with ANTLR and Kotlin


Three-valued logic

In three-valued logic systems, there are three truth values indicating true, false, and indeterminate.

a b a ∧ b a ∨ b a ⊕ b a ⇔ b a ⇒ b
F ? F ? ? ? T
? F F ? ? ? ?
? ? ? ? ? ? ?
? T ? T ? ? T
T ? ? T ? ? ?
a ¬a
? ?

Package tribool implements three-valued logic in Go:






VSCodium is a collection of scripts to automatically build Visual Studio Code into freely-licensed binaries without telemetry or Microsoft-specific functionality or branding.

RealWorld example apps

The RealWorld project is a specification of a clone and implementations in many languages. The front-end and back-end languages can be swapped because each implementation follows the same API.

The current most popular front-ends are React / Redux, Angular, and Elm while the most popular back-ends are Node / Express, Go / Gin, and ASP.NET Core.

Elm front-end:

Project idea: there is not yet an implementation for a front-end written in Go and compiled to WebAssembly. I've also been interested in learning Elm, so this may be a good chance to do so.


Spectacle window organizer

Spectacle is a window organizer for macOS that adds keyboard shortcuts for moving and resizing windows.

Go string to []byte conversion without allocation


Clarification on unsafe conversion between string <-> []byte

History Trends Unlimited

Google Chrome history is limited the past three months, but using the History Trends Unlimited extension, history can be archived and searched indefinitely. The data is stored in a local database and can easily be imported or exported.

To import manually history from Chrome's History SQLite database, the data must first be fit into a tab separated format.

sqlite> .open History
sqlite> .mode tabs
sqlite> .output /path/to/archived_history.tsv
sqlite> SELECT u.url, v.visit_time, v.transition
          FROM urls u INNER JOIN visits v ON
          WHERE u.hidden=0 ORDER BY u.url;

FiraCode version 2

Version 2 of FiraCode adds the long awaited stylistic sets along with 136 new glyphs and 55 updated glyphs. Included in the update is a fix for an issue that I reported with the division slash character, commonly used in Go assembly.

Nikita Prokopov describes how to enable stylistic sets in various editors. In VS Code, this can be done by injecting custom CSS into the editor with an extension.

Detect webcam and microphone usage

Oversight detects usage of the webcam and microphone on macOS. When a process accesses the built-in webcam or microphone, Oversight alerts the user and provides options to allow or block the access.

The camera and microphone can be disabled entirely in system files, but that prevents legitimate usage.

Hindley–Milner type system

(stub) Used by ML and Haskell.

Types are a partial order, so with polymorphism, the type of an expression can be determined by finding the smallest value in the set of all possible types for that expression.–Milner_type_system


Building Whitespace interpreter source

The official Whitespace interpreter written in Haskell is rather dated and needs some fixes to compile with modern GHC. Using the Haskell API search engine Hoogle, the updated module names can be found.

import IO
import System(getArgs)

{- becomes: -}
import System.IO
import System.Environment(getArgs)

Additionally, the compiler option -fvia-C is no longer supported and can be removed from the Makefile.

JetBrains MPS visual DSL IDE

JetBrains MPS (Meta Programming System) is an IDE the enables the creation and use of visual DSLs. Code is maintained as an AST rather than a textual format so that graphical elements like tables, diagrams, matrices, or equations can be embedded in the code.

Similar to MPS, DrRacket supports graphical elements such as images in program source. Languages such as Scratch or GameMaker that are composed entirely of graphical blocks could potentially be represented with MPS.


Gradual memory management

(stub) "A Framework for Gradual Memory Management"

Gradual programming

Will Chrichton writes that programming is a gradual process. The program evolves as it develops, but decisions made early like whether to use static or dynamic typing are difficult to change late. He believes that the largest problem in the field of programming languages research is viewing programming languages through the lens of human computer interaction.

I hold this fundamental belief: programming languages should be designed to match the human programming process. We should seek to understand how people think about programs and determine what programming processes come intuitively to the human mind. There are all sorts of fascinating questions here, like:

  • Is imperative programming more intuitive to people than functional programming? If so, is it because it matches the way our brains are configured, or because it's simply the most common form of programming?
  • How far should we go to match people's natural processes versus trying to change the way people think about programming?
  • How impactful are comments in understanding a program? Variable names? Types? Control flow?

Chrichton lists six axes of evolution:

  • Concrete / abstract
  • Anonymous / named
  • Imperative / declarative
  • Dynamically typed / statically typed
  • Dynamically deallocated / statically deallocated
  • General-purpose / domain-specific

Imperative and declarative:

For a multitude of reasons, straight-line, sequential imperative code appears to come more naturally to programmers than functional/declarative code in their conceptual program model. For example, a simple list transformation will likely use for loops:

in_l = [1, 2, 3, 4]
out_l = []
for x in in_l:
  if x % 2 == 0:
    out_l.append(x * 2)

Whereas a more declarative version will abstract away the control flow into domain-specific primitives:

in_l = [1, 2, 3, 4]
out_l = map(lambda x: x * 2, filter(lambda x: x % 2 == 0, in_l))

The distinction between the two is not just stylistic - declarative code is usually much more easily analyzed for structure, e.g. a map is trivially parallelizable whereas a general for loop less so. This transformation occurs most often in languages which support mixed imperative/functional code (at the very least closures).

Memory safety:

In 2018, all programming languages should be memory safe, with the only question being whether memory deallocation is determined at compile time (i.e. like Rust, with a borrow checker) or at run time (i.e. like every other language, with a garbage collector). Garbage collection is unquestionably a productivity boost for programmers, as it's natural that our initial program model shouldn't have to consider exactly how long each value should live before deallocation.

Gradual programming vision:

In that light, advancing gradual programming entails the following research process:

  • Identify parts of the programming process that change gradually over time, but currently require undue overhead or switching languages to adapt.
  • Develop language mechanisms that enable programmers to gradually move along a particular axis within a homogeneous programming environment.
  • Empirically validate against real programmers whether the proposed mechanisms match the hypothesized programming process in practice.

JavaScript source map structure

JavaScript source maps provide a mapping from the compressed JavaScript served to the client to the source symbols and names. Languages that compile to JavaScript can also emit source maps.

The mappings are specified compactly using Base64 VLQ (Variable Length Quantity). Each segment consists of 1, 4, or 5 fields:

  • Generated column
  • Original file
  • Original line number
  • Original column
  • Original name, if available

To store many large numbers in small space, a continuation bit builds on a segment value, a space saving technique with its roots in the MIDI format.

Rust as a compilation target


Automatic reference counting



Command-line weather

Querying displays the weather for your current location in colored ASCII art.


Nim programming language

Nim (formerly named Nimrod) is an imperative, general-purpose, multi-paradigm, statically typed, systems, compiled programming language designed and developed by Andreas Rumpf. It is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, a foreign function interface (FFI) with C and C++, and compiling to C, C++, Objective-C, and JavaScript.

Features listed on its website include:

  • Fast deferred reference counting memory management.
  • Iterators are are compiled to inline loops.
  • Compile-time evaluation of user-defined functions.
  • Preference of value-based data types allocated on the stack.
  • Macro system allows direct manipulation of the AST.
  • Supports backends including compilation to C, C++, and JavaScript.

Choose from a deferred RC'ing [reference counting] garbage collector that is fast, incremental and pauseless; or a soft real-time garbage collector that is deterministic allowing you to specify its max pause time; and many others.

A chart shows Nim as having the best garbage collector pause times and memory usage compared to Go, Java, and Haskell.

{Nim}{PL}{garbage collection}

Unicode capacity

(stub) 21-bit width:

Binary Hex Comments
0xxxxxxx 0x00..0x7F Only byte of a 1-byte character encoding
10xxxxxx 0x80..0xBF Continuation byte: one of 1-3 bytes following the first
110xxxxx 0xC0..0xDF First byte of a 2-byte character encoding
1110xxxx 0xE0..0xEF First byte of a 3-byte character encoding
11110xxx 0xF0..0xF7 First byte of a 4-byte character encoding

Four column ASCII


The "space" character had to come before graphics to make sorting easier, so it became position 20hex; for the same reason, many special signs commonly used as separators were placed before digits. (Wikipedia)

00 01 10 11
NUL @ ` 00000
SOH ! A a 00001
STX " B b 00010
ETX # C c 00011
EOT $ D d 00100
ENQ % E e 00101
ACK & F f 00110
BEL ' G g 00111
BS ( H h 01000
TAB ) I i 01001
LF * J j 01010
VT + K k 01011
FF , L l 01100
CR - M m 01101
SO . N n 01110
SI / O o 01111
DLE 0 P p 10000
DC1 1 Q q 10001
DC2 2 R r 10010
DC3 3 S s 10011
DC4 4 T t 10100
NAK 5 U u 10101
SYN 6 V v 10110
ETB 7 W w 10111
CAN 8 X x 11000
EM 9 Y y 11001
SUB : Z z 11010
ESC ; [ { 11011
FS < \ | 11100
GS = ] } 11101
RS > ^ ~ 11110
US ? _ DEL 11111


Ceiling division

Python has floor division using the // operator. The naïve approach for ceiling division is to use math.ceil which converts the operands to floating point and would lose precision greater than 53 bits. Instead, flip the sign to do upside-down floor division.

def ceil_div(a, b):
    return -(-a // b)

Reverse Engineering for Beginners

"Reverse Engineering for Beginners", a free ebook written by Dennis Yurichev, alternatively known as "Understanding Assembly Language", covers reverse engineering techniques with many real world examples included.

Yurichev also has published several other ebooks including "Math for Programmers":

SMT Solvers


Go bindings for Z3:

Abstract interpretation


Curry language

(stub) Created by the same guy as ALF.

Algebraic Logic Functional programming language

ALF is a language designed by Michael Hanus that combines functional and logic programming paradigms.

ALF is a language which combines functional and logic programming techniques. The foundation of ALF is Horn clause logic with equality which consists of predicates and Horn clauses for logic programming, and functions and equations for functional programming. Since ALF is a genuine integration of both programming paradigms, any functional expression can be used in a goal literal and arbitrary predicates can occur in conditions of equations. The operational semantics of ALF is based on the resolution rule to solve literals and narrowing to evaluate functional expressions. In order to reduce the number of possible narrowing steps, a leftmost-innermost basic narrowing strategy is used which can be efficiently implemented. Furthermore, terms are simplified by rewriting before a narrowing step is applied and also equations are rejected if the two sides have different constructors at the top. Rewriting and rejection can result in a large reduction of the search tree. Therefore this operational semantics is more efficient than Prolog's resolution strategy.

Semantic resolution trees


The Wikipedia page on semantic resolution trees is an empty stub.

LLVM zero division



Turing machines or lambda calculus

Kimball Germane remarked that I seem like a Turing machine kind of guy as opposed to lambda calculus. I concern myself with the low-level details of a program and compiler optimizations in a similar manner to the mechanical-like Turing machines. He says that Turing machines are for machines and lambda calculus is for humans.


Parse tree and AST distinction

The parse tree (also known as "concrete syntax tree") is a concrete representation of the input, retaining all of the information of the input. The AST is an abstract representation of the input, so derivable associations like parentheses or whitespace need not be present.




I have read section 1 and half of section 2:

Specifically, GCC suffers from layering problems and leaky abstractions: the back end walks front-end ASTs to generate debug info, the front ends generate back-end data structures, and the entire compiler depends on global data structures set up by the command line interface.


Install-time optimization is the idea of delaying code generation even later than link time, all the way to install time, as shown in Figure 11.7. Install time is a very interesting time (in cases when software is shipped in a box, downloaded, uploaded to a mobile device, etc.), because this is when you find out the specifics of the device you're targeting. In the x86 family for example, there are broad variety of chips and characteristics. By delaying instruction choice, scheduling, and other aspects of code generation, you can pick the best answers for the specific hardware an application ends up running on.


Creating the Go programming language

Rob Pike and Robert Griesemer talk on the Go Time podcast's 100th episode about the history, influence, and future of Go.

Go is the first language to enforce formatting through an external tool like Gofmt. Gofmt enables refactoring and language change updates using Gofix.

Announced in golang-nuts mailing list:!topic/golang-nuts/UWtBKoQp8wk

Immutability in JavaScript

Object.freeze freezes an object, preventing addition and removal of properties and prevents changes to existing properties or the prototype.

Object.seal seals an object, preventing addition of new properties and marking all existing properties as non-configurable. Values of present properties can still be changed as long as they are writable.


macOS dynamic wallpapers

Marcin Czachurski, in a three-part article, describes the dynamic wallpaper feature added in macOS Mojave. The default dynamic wallpaper is a shot of Mojave that changes based on the time of day. Each frame stores the altitude and azimuth to accurately update the image according to the local position of the sun.



Dependently typed

Darcs version control system

Darcs is a distributed version control system developed by physicist David Roundy. It is based on a patch algebra with patches in a repository as a partially ordered set.

The patches of a repository are linearly ordered. Darcs automatically calculates whether patches can be reordered (an operation called commutation), and how to do it.


The notion of dependency between patches is defined syntactically. Intuitively, a patch B depends on another patch A if A provides the content that B modifies. This means that patches that modify different parts of the code are considered, by default, independent. To address cases when this is not desirable, Darcs enables the user to specify explicit dependencies between patches.

Darcs merging is similar to rebasing in Git, though unlike Git, reordering patches does not change patch identities. Also, Darcs repositories can operate in lazy mode where history is retrieved on demand.

Bitwise set operations

Bit sets can be manipulated using bitwise operators in a manner similar to sets.

Bitwise Set Operation
`a b` a ∪ b
a & b a ∩ b Intersection
a ^ b a ⊕ b Exclusive or
a &^ b a - b Difference
^a U - a Complement
popcnt(a) |a| Cardinality


Bitwidth analyzing compiler


Mark Stephenson, Jonathan Babb, and Saman Amarasinghe. Bitwidth analysis with application to silicon compilation. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, pages 108-120, 2000.

We ease loop identification in SSA form by converting all φ-functions that occur in loop headers to μ-functions.


Projects using OCaml


The reference WebAssembly interpreter is written in OCaml.

The static JavaScript typehecker Flow is written in OCaml and compiled to JavaScript.

XQuery XML query language

XQuery is a functional query language that queries and transforms structured and unstructured data, usually in the form of XML.

WebSQL and WebOQL web query languages

WebSQL and WebOQL are declarative programming languages designed to support web querying and restructuring.

[WebSQL] aimed to combine the content-based queries of search engines with structure-based queries similar to what one would find in a database system. The language combines conditions on text patterns appearing within documents with graph patterns describing link structure.

WebSQL proposes to model the web as a relational database composed of two (virtual) relations: Document and Anchor. The Document relation has one tuple for each document in the web and the Anchor relation has one tuple for each anchor in each document in the web. This relational abstraction of the web allows us to use a query language similar to SQL to pose the queries.

If Document and Anchor were actual relations, we could simply use SQL to write queries on them. But since the Document and Anchor relations are completely virtual and there is no way to enumerate them, we cannot operate on them directly. The WebSQL semantics depends instead on materializing portions of them by specifying the documents of interest in the FROM clause of a query. The basic way of materializing a portion of the web is by navigating from known URL's. Path regular expressions are used to describe this navigation. An atom of such a regular expression can be of the form d1 => d2, meaning document d1 points to d2 and d2 is stored on a different server from d1; or d1 -> d2, meaning d1 points to d2 and d2 is stored on the same server as d1.

SELECT d.url,e.url,a.label
          "www.mysite.start" ->* d,
      Document e SUCH THAT d => e,
      Anchor   a SUCH THAT a.base = d.url
WHERE a.href = e.url AND a.label = "label";

[WebSQL] treats web pages as atomic objects with two properties: they contain or do not contain certain text patterns, and they point to other objects. ... [WebOQL] goes beyond WebSQL in two significant ways. First, it provides access to the internal structure of the web objects it manipulates. Second, it provides the ability to create new complex structures as a result of a query. Since the data on the web is commonly semistructured, the language emphasizes the ability to support semistructured features.

The main data structure provided by WebOQL is the hypertree. Hypertrees are ordered arc-labeled trees with two types of arcs, internal and external. Internal arcs are used to represent structured objects and external arcs are used to represent references (typically hyperlinks) among objects. Arcs are labeled with records.


Sets of related hypertrees are collected into webs. Both hypertrees and webs can be manipulated using WebOQL and created as the result of a query.

WebOQL is a functional language, but queries are couched in the familiar select-from-where form.

[Tag:"result" /
  select y
  from x in browse("file:pubs.xml") via ^*[tag = "publications"],
       y in x',
       z in y'
  where z.tag = "author" and z.value ~ "Smith"


Uses for spare computers

  • Media server
  • Central backup server with redundancy
  • File server
  • Penetration testing target
  • Virus-infected sandbox
  • Collaborative remote desktop

Idea lists:

CollabVM allows you to control remote machines through a website and perform actions by voting:


Fossil source code manager


Used by and designed for SQLite; relational.


CRAPL academic-strength open-source licence


Red-Black tree deletion

Marc Nieper-Wißkirchen talks (in German) at Curry Club Ausburg about Kimball Germane and Matthew Might's paper on deletions in red-black trees and implements the algorithm in Scheme.

As a joke, when he launches vi, it errors and running vim launches emacs instead.

/bin/bash: computer has not enough memory to run process: vi

Coq proof language

Kimball Germane provides an introduction to the Coq theorem prover.

Coq implements a dependently-typed strongly-normalizing programming language that allows users to express formal specifications of programs. Coq assists the user in finding artifacts that meet a specification and from which it can extract a certified implementation in Haskell, Racket, or OCaml automatically. This talk will iterate through a series of increasingly-precise specifications of a commonly-used function and the experience of a Coq user meeting these specifications.

Defining natural numbers in terms of successor:

Inductive nat : Set :=
  | 0 : nat
  | S : nat -> nat.

Eval compute in 0.
Eval compute in (S 0).
Eval compute in (S (S (S 0))). (* 3 *)

Proving associativity of addition using induction:

Fixpoint add (a b : nat) : nat :=
  match a with
  | 0 => b
  | S n => S (add n b)

Theorem add_assoc : forall (a b c : nat),
  (add a (add b c)) = (add (add a b) c).
intros a b c.
induction a. simpl. reflexivity.
simpl. rewrite -> IHa. reflexivity.

Projects using Coq and resources:

(* notable projects
   Verified Software Toolchain
   ceL4 and CertiKOS - verified kernels

(* Resources
   Coq Reference Manual
   Coq'Art - Interactive Theorem Proving and Program Development
   Certified Programming with Dependent Types - Alan Chlipala
   Software Foundation

I wanted to talk a little about some cool projects that are being done with Coq. Vellvm is verified LLVM, so that's what it sounds like; Verified Software Toolchain and Ynot are frameworks to reason about C programs and imperative programs; Bedrock reasoning about assembly language; there are some verified kernels that are fully verified; and CompCert is a gigantic project which aimed to have a formalized C compiler and they succeeded.

One of the researchers at the University of Utah, John Regehr - they do hardening for compilers and fuzzying for C compilers and CompCert had the fewest number of bugs and the only bugs it had were in the unverified parts, which was the front-end parser, and since then, that project has verified their front-end parser. So there's good reason to think that there are not mistakes in CompCert.

They [CompCert] define the semantics for C and the semantics for assembly and their high-level proof is that the semantics are preserved across compilation. The chain includes register allocation, so it has stuff about graph coloring - on their website there's a diagram that shows all of the phases that they've proven.

He mentions the Idris programming language as being dependently typed similar to Coq.

Idris is a dependently typed programming language that is getting more popular and I wonder if they would extract to Idris differently where you had to provide a proof.

Gemini Guidance Computer


Saturn Launch Vehicle Digital Computer


Memory was in the form of 13-bit syllables, each with a 14th parity bit. Instructions were one syllable in size, while data words were two syllables (26 bits).

The LVDC was highly redundant and reliable:

For reliability, the LVDC used triple-redundant logic and a voting system. The computer included three identical logic systems. Each logic system was split into a seven-stage pipeline. At each stage in the pipeline, a voting system would take a majority vote on the results, with the most popular result being passed on to the next stage in all pipelines. This meant that, for each of the seven stages, one module in any one of the three pipelines could fail, and the LVDC would still produce the correct results.

There are 18 simple instructions.

Apparently, the LVDC was hand-compiled:

Young (American) programmers just out of college were then employed to manually compile the FORTRAN program into the assembly language of the embedded LVDC CPU, and presumably to make whatever other adjustments are needed when you pass from the computing environment of a large computer to a smaller embedded system.


Backtracking regexp in Go

Doug Clark wrote a regular expression library for Go that allows backtracking and does not have the constant-time guarantees of the built-in regexp package.

GMP pi computation

GMP can be used to compute pi and is the fastest implementation of those surveyed by Nick Craig-Wood.

curl --output gmp-chudnovsky.c
gcc -s -Wall -o gmp-chudnovsky gmp-chudnovsky.c -lgmp -lm

Ubuntu on Raspberry Pi 4

The Raspberry Pi 4 changed to 64-bit, so most operating systems other than the default Raspbian distribution are not currently compatible. CloudKernels walks through their process of building a 64-bit bootable Ubuntu image for the Pi 4.


C char array and pointer


Go memory model


Profiling Go programs


Go as a compiler construction language

If self compilation was an early goal of Go, it would have been more compiler oriented - a design I would greatly appreciate.

Go turned out to be a fine language in which to implement a Go compiler, although that was not its original goal. Not being self-hosting from the beginning allowed Go's design to concentrate on its original use case, which was networked servers. Had we decided Go should compile itself early on, we might have ended up with a language targeted more for compiler construction, which is a worthy goal but not the one we had initially.

Early Go development

The debugger was originally named ogle. Old versions of the FAQ mention that "'Ogle' would be a good name for a Go debugger."

A vector container package used to exist for sequential storage with specialized versions for int and string.

The first commit after the hello world programs contains an early annotated Go spec.

In Rob Pike's Gophercon 2014 talk "Hello, Gophers!", he discusses the language inspiration and development.

TODO: read original spec

Go compiler naming scheme

The Go compiler borrows from the Plan 9 naming scheme:

The 6g (and 8g and 5g) compiler is named in the tradition of the Plan 9 C compilers, described in (see the table in section 2). 6 is the architecture letter for amd64 (or x86-64, if you prefer), while g stands for Go.

Plan 9 compilers:

  • 0a, 1a, 2a, 5a, 7a, 8a, ka, qa, va - assemblers
  • 0c, 1c, 2c, 5c, 7c, 8c, kc, qc, vc - C compilers
  • 0l, 1l, 2l, 5l, 7l, 8l, kl, ql, vl - loaders


The Go gopher formerly named Gordon

The well known mascot of Go is called simply "the Go gopher".

However, in its early days, it was known as "Gordon the Go Gopher". This can be seen on the homepage of Glenda, the Plan 9 Bunny, in the Internet Archive, from about 2009-12-06 to 2013-04-01.

SSA form bibliography


Annotated bibliography of 106 papers relating to SSA form:

Go SSA form tools


Plan 9 applications


Plan 9 even has a file system filter rot13fs.c to transform traffic with ROT13.

Functional higher-order functions in Go

Using higher-order functions borrowed from functional languages in Go such as apply, filter, and reduce is considered an anti-pattern and for loops are instead preferable.

Redis persistance using fork

RDB maximizes Redis performances since the only work the Redis parent process needs to do in order to persist is forking a child that will do all the rest. The parent instance will never perform disk I/O or alike.


RDB needs to fork() often in order to persist on disk using a child process. Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great. AOF also needs to fork() but you can tune how often you want to rewrite your logs without any trade-off on durability.


Whenever Redis needs to dump the dataset to disk, this is what happens:

  • Redis forks. We now have a child and a parent process.
  • The child starts to write the dataset to a temporary RDB file.
  • When the child is done writing the new RDB file, it replaces the old one.

This method allows Redis to benefit from copy-on-write semantics.


Log rewriting uses the same copy-on-write trick already in use for snapshotting. This is how it works:

  • Redis forks, so now we have a child and a parent process.
  • The child starts writing the new AOF in a temporary file.
  • The parent accumulates all the new changes in an in-memory buffer (but at the same time it writes the new changes in the old append-only file, so if the rewriting fails, we are safe).
  • When the child is done rewriting the file, the parent gets a signal, and appends the in-memory buffer at the end of the file generated by the child.
  • Profit! Now Redis atomically renames the old file into the new one, and starts appending new data into the new file.

Forking in threaded applications



Binary combinatory logic esoteric lang

Binary combinatory logic is a formulation of combinatory logic using only the symbols 0 and 1.

<term> ::= 00 | 01 | 1 <term> <term>
  • 00 represents the K operator.
  • 01 represents the S operator.
  • 1 <term> <term> represents the application operator (<term1> <term2>).

{esolang}{Binary Combinatory Logic}

GrammaTech CodeSonar and CodeSurfer



Local development certificate generation


GitHub public keys vulnerable

GitHub public keys are available for anyone to access at the URL Unless configured otherwise, ssh sends all public keys until one works. By storing all GitHub keys, a server can identify the client by their key.


macOS Catalina default shell is zsh

Starting in macOS Catalina, the default shell will be zsh. The version of Bash used by macOS is stuck on 3.2 because newer versions newer versions are licensed under GPL v3.

Scripting OS X gives an in depth walkthrough on migrating to zsh:


Google text adventure walkthrough

On the StrategyWiki, a simple walkthrough is presented for the simple text adventure game hidden in Google Search.

Stack-based language compilation

"Compilation of Stack-Based Languages (Abschlußbericht)" by M. Anton Ertl and Christian Pirker (1998) describes techniques for compiling stack-based languages.

RAFTS is a framework for applying state of the art compiler technology to the compilation of stack-based languages like Forth and Postscript. The special needs of stack-based languages are an efficient stack representation, fast procedure calls, and fast compilation. RAFTS addresses the stack representation problem by allocating stack items to registers such that most stack accesses in the source program are register accesses in the machine language program, and by eliminating most stack pointer updates. To achieve fast calls, RAFTS performs these optimizations interprocedurally and also performs procedure inlining and tail call optimization. Fast compilation is achieved by selecting fast algorithms and implementing them efficiently.


The basic block code generation reduces the number of stack pointer updates to at most one per stack and basic block. It is possible to reduce the number much more. E.g., in procedures where all stack items are allocated to registers, no stack pointer update is needed at all. Like register allocation, stack pointer update minimization has to be performed interprocedurally to achieve a significant effect.


Chrome offline dinosaur source

The source for the offline dinosaur game in Chrome is available in Chromium and written in JavaScript.

Project idea: extract the dinosaur game into a standalone page.

The T-Rex appears to be named "Stan the Offline Dino" as referenced in several test files:

<head><title>Welcome to Stan the Offline Dino's Homepage</title></head>

Wikipedia documents the evolution of the dinosaur game:

If the user tries to browse when offline, a message is shown that they are not connected to the Internet. An illustration of the "Lonely T-Rex" dinosaur is shown at the top, designed by Sebastien Gabriel. From September 2014, tapping the dinosaur (in Android or iOS) or pressing space or ↑ (on desktop) launches a browser game called "T-Rex Runner" in which the player controls a running dinosaur by tapping the screen (in Android or iOS) or pressing space, ↑ or ↓ (on desktop) to avoid obstacles, including cacti and, from June 2015, pterodactyls. In 2016, another feature was added to the game. When the player reaches 700 points the game begins to switch between day (white background, black lines and shapes) and night (black background, white lines and shapes). During September 2018, for Google Chrome's 10th birthday, a birthday cake causing the dinosaur to wear a birthday hat when collected was added. Reaching a score of 900 will switch the colour scheme back to day, and the switch back and forth will occur at further subsequent milestones. The game is also available at the chrome://network-error/-106 and chrome://dino pages. The game's code is available on the Chromium site.

Conway's Game of Life


Project idea: implement a Conway's Game of Life simulation in Go using bit packing.

Query language research

BYU computer science professor Kimball Germane, specializes in programming languages and his current research is in utilizing the SQLite VM bytecode to construct DSL query languages that are more expressive than allowed through SQL.

SQLite bytecode engine:


Google Easter eggs

Wikipedia lists many Easter eggs hidden in Google search. Included is a selection of those queries:

  • "<blink>", "blink tag", or "blink html" includes samples of the blink element in the results.
  • "conway's game of life" on a desktop browser generates a running configuration of the game to the right of the search results. The process can also be stopped and altered by the user.
  • "google in 1998" on a desktop browser will generate a layout similar to the one Google used for its search engine in 1998.
  • "is google down" returns with "No".
  • "kerning" will add spaces between the letters of the word "kerning" in the search results.
  • "keming" will remove spaces between the letters of the word "keming".
  • "<marquee>", "marquee tag", or "marquee html" will apply the marquee element to the results count at the top of the results.
  • "minesweeper" will have a playable game of minesweeper. Users can select between three modes: easy, medium and hard.
  • "pac-man", "google pacman" or "play pacman" will show the Pac-Man related interactive Google Doodle from 2010. Clicking Insert Coin twice will enable a second player, Ms. Pac-Man.
  • "pluto" describes Pluto as "Our favorite dwarf planet since 2006" in the Knowledge Graph.
  • "recursion" includes a "Did you mean: recursion" link back to the same page.
  • "text adventure" or "google easter eggs" using most popular modern browsers (except Safari) and opening the browser's developer console will trigger a text-based adventure game playable within the console.
  • "tic tac toe" will show a playable game of tic-tac-toe. Users can select to play against the browser at different levels - "easy", "medium" or "hard" (called "impossible") - or against a friend. An alternative way to find the game is to search "shall we play a game".


C main signatures

A common extension to C supported by Unix systems adds a third parameter for environment information.

int main(void)
int main(int argc, char *argv[])
int main(int argc, char *argv[], char *envp[])

Alternatively, the environment is available in <unistd.h> with extern char **environ;.

In C, a function without parameters accepts any number of arguments, so int main() accepts any arguments, whereas int main(void) accepts none. C++ treats those two forms identically.


autoplay: a learning environment for interactive fiction

Daniel Ricks developed autoplay, an environment employing machine learning to play text-based games.


Git Bash completion with aliases

Using the built-in Git function __git_complete, Bash command completion can be enabled for aliases.

alias branch='git branch'
alias checkout='git checkout'
alias check='git checkout'
alias commit='git commit'
alias diff='git diff'
alias fetch='git fetch'
alias pull='git pull'
alias push='git push'

source "$HOME/git-completion.bash"
__git_complete branch _git_branch
__git_complete checkout _git_checkout
__git_complete check _git_checkout
__git_complete commit _git_commit
__git_complete diff _git_diff
__git_complete fetch _git_fetch
__git_complete pull _git_pull
__git_complete push _git_push

See entry from 2019-05-24 for installation of Git completion.


Todo.txt is a minimal task management format with a command-line interface and mobile apps.


Aheui - esoteric language in Hangul

Aheui is a grid-based programming language reflecting the graphical design of Hangul, the Korean writing system. It is the first esolang designed for Hangul.

The language specification provides an introduction into Korean orthography and lists the function of each vowel and consonant.

Interpreters are implemented in a dozen languages, including Go and self interpreted in Aheui:

AVIS is a cell based editor for Aheui:

Aheui in a polyglot:



μ6 esoteric language

μ6 is a low-level esolang based on μ-recursive functions with minimal syntax like Brainfuck. Commands are encoded as nibbles and base-6 integers are used.

Nibble ASCII Function
0000 0 Digit 0
0001 1 Digit 1
0010 2 Digit 2
0011 3 Digit 3
0100 4 Digit 4
0101 5 Digit 5
0110 [ Begin of function composition
0111 ] End of function composition
1000 / Projection
1001 . Constant zero-function
1010 + Successor-function
1011 , Pairing-function
1100 < Left of pair or const
1101 > Right of pair or const
1110 # Primitive recursion
1111 @ μ-operator (minimization)


Google Chrome history database

Google Chrome history is stored locally as a SQLite3 database and can be easily exported.

cd ~/Library/Application\ Support/Google/Chrome/Default/
sqlite3 History

sqlite> .headers on
sqlite> .mode csv
sqlite> .output my-chrome-output.csv

sqlite> SELECT DATETIME(last_visit_time/1000000-11644473600, 'unixepoch', 'localtime'), url
        FROM urls
        ORDER BY last_visit_time DESC;


XQuartz - X11 for macOS

The XQuartz open source project is a version of the X11 windowing system for macOS.

APOD Automator script

Automator in macOS can be used to automatically download the current NASA Astronomy Picture of the Day and set it as the desktop background.

Karabiner - macOS keyboard customizer

Karabiner is a low level keyboard customization utility for macOS.

I use it for a few simple bindings:

  • Fn -> Left Ctrl
  • Left Ctrl -> Fn
  • Caps Lock -> Esc

Nikita Voloboev demos his extensive Karabiner customizations:


Go sync.Map

Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant-time.

The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content.

The Map type is optimized for two common use cases:

  1. when the entry for a given key is only ever written once but read many times, as in caches that only grow, or
  2. when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex.

Go sync.Once

Once is an object that will perform exactly one action.

Do calls the function f if and only if Do is being called for the first time for this instance of Once. In other words, given var once Once, if once.Do(f) is called multiple times, only the first call will invoke f, even if f has a different value in each invocation. A new instance of Once is required for each function to execute.

Used by crypto/elliptic:

Go types package

Package go/types declares the data types and implements the algorithms for type-checking of Go packages.

Alan Donovan provides a detailed tutorial on the use of the package:

Info has maps to store the relationship between identifiers and objects. Only non-nil maps in Info are populated, letting API clients control the information needed from the type checker. The field Defs records declaring identifiers and Uses records referring identifiers.

type Info struct {
  Defs       map[*ast.Ident]Object
  Uses       map[*ast.Ident]Object
  Implicits  map[ast.Node]Object
  Selections map[*ast.SelectorExpr]*Selection
  Scopes     map[ast.Node]*Scope

TODO: Continue reading at

Referring page:

gosec - Go security checker

Gosec inspects source code for security problems by scanning the Go AST.

Floating point math associativity

GCC optimizes pow(a, 2) into a*a, but does not optimize pow(a, 6) or a*a*a*a*a*a into (a*a*a)*(a*a*a) because floating point math is not associative, though associativity and other optimizations can be enabled with compiler flags.

C++ compiler division by zero optimization

The C++ compiler does not throw a division by zero exception when d == 0.

int d = 0;
d /= d;

C++ does not have a "Division by Zero" Exception to catch. The behavior you're observing is the result of Compiler optimizations:

  1. The compiler assumes Undefined Behavior doesn't happen
  2. Division by Zero in C++ is undefined behavior
  3. Therefore, code which can cause a Division by Zero is presumed to not do so.
    • And, code which must cause a Division by Zero is presumed to never happen
  4. Therefore, the compiler deduces that because Undefined Behavior doesn't happen, then the conditions for Undefined Behavior in this code (d == 0) must not happen
  5. Therefore, d / d must always equal 1.


Go language proverbs

Rob Pike philosophizes at Gopherfest SV 2015 and provides the following proverbs for teaching or understanding Go:

  • Don't communicate by sharing memory, share memory by communicating.
  • Concurrency is not parallelism.
  • Channels orchestrate; mutexes serialize.
  • The bigger the interface, the weaker the abstraction.
  • Make the zero value useful.
  • interface{} says nothing.
  • Gofmt's style is no one's favorite, yet gofmt is everyone's favorite.
  • A little copying is better than a little dependency.
  • Syscall must always be guarded with build tags.
  • Cgo must always be guarded with build tags.
  • Cgo is not Go.
  • With the unsafe package there are no guarantees.
  • Clear is better than clever.
  • Reflection is never clear.
  • Errors are values.
  • Don't just check errors, handle them gracefully.
  • Design the architecture, name the components, document the details.
  • Documentation is for users.
  • Don't panic.

XKCD surveys



Yorick experimentation

Matrices in Yorick are column-major, so to transpose a column vector to a row vector, we increase the dimensionality of the matrix using a "-" pseudo-index.

> u=[1,2,3,4]
> u
> u(-,)

Outer product of column vectors uv = uvᵀ:

> u=[1,2,3,4]
> v=[1,10,100]
> v(-,)
> u*v(-,)
> transpose(u*v(-,))

Inner product (dot product), defined as ⟨u, v⟩ = uv, is represented in Yorick using the "+" sign. The plus sign selects "the dimension to be iterated over in the summation of the inner product."

> u=[1,2,3]
> v=[4,5,6]
> u(+)*v(+)

Matrix multiplication is composed of dot products at each position and is thus represented using the plus sign. The transpose matches "normal" matrix multiplication since Yorick is column-major.

> a=[[1,2],[3,4]]
> b=[[5,6],[7,8]]
> a(+,)*b(,+)
> transpose(a(+,)*b(,+))

The original NumPy authors were familiar with Yorick and borrowed the concept of broadcasting from Yorick.

JEH-Tech explains Yorick with fantastic diagrams.


Go terminal package

Package crypto/ssh/terminal provides support functions for dealing with terminals, as commonly found on UNIX systems.

Yorick syntax and building

Yorick has optional semicolons to enable easier to type statements into the terminal. When omitted, the lexer must insert semicolons so that the parser works correctly which complicates the grammar significantly. This context sensitivity is called a "lexical tie-in" and is discouraged.

Yorick can be built from source from its repository.

make config
make check
make install

Simple hello world:

> print, "Hello, World!"
"Hello, World!"

Go astutil package

Package astutil contains common utilities for working with the Go AST.


Split a subdirectory into a separate repo

A subdirectory can be split into a separate repo, the inverse of a repo merge. Any history existing outside of that subdirectory will not appear in the split repo. This causes problems if that folder has been moved.

git filter-branch --prune-empty --subdirectory-filter FOLDER-NAME BRANCH-NAME


VSCode TextMate grammar performance



Go Native Client support

Russ Cox describes in detail the process of implementing support for Native Client (NaCl) in Go 1.3 and the architecture restrictions that added complexity.

Go 1.13 is the last release that will run on NaCl.

Comparing binaries and source code


Control flow graph function matching

Joxean Koret proposes a heuristic based on the idea that "different basic blocks and edges are different interesting pieces of information". The Koret-Karamitas algorithm "КОКА" gets features at function, basic block, edge, and instruction level, assigns a different prime value to each different feature, and then generates a hash of the product.

Huku classifies basic blocks in 7 categories: normal, entry points, exit points, traps, self-loops, loop heads and loop tails. In the same way, he classifies 4 different kinds of edges: basis, forward, back edges and cross-links.

Huku uses instruction histograms to classify instructions in 4 categories based on their functionality: arithmetic, logic, data transfer, and redirection.

JCry - a ransomware written in Go

JCry is downloaded as a fake update to Adobe Flash Player through a compromised website. It drops encryption and decryption programs into Startup, then encrypts the 1MB of all files with significant extensions. It then demands payment for a decryption key through an onion link in a Tor browser.

Raspberry Pi arcade

Recently while in San Francisco, I stumbled upon a Raiden II arcade machine in Musée Mécanique. As a child, one of my favorite games was Raiden X, a Flash spinoff of the Raiden series, so it was fun to play the original game.

Project idea: create a dedicated arcade machine using a Raspberry Pi, monitor, joystick, and buttons to play arcade games like the Raiden series, Pac Man, or Dig Dug or Flash games like Raiden X. Software like MAME (Multiple Arcade Machine Emulator) exists for arcade games, so those would be simple, but the Flash format poses issues because support has been largely dropped because the runtime has security issues and the PC controls would need to be mapped to a joystick and buttons.

Origins of the Raspberry Pi

The Raspberry Pi was developed to introduce more people to programming and at a low cost. The cost of $35 was a goal early on and drove many design decisions. Later once produced in bulk, upgrades could be made while staying within the price range.

Transposing an 8x8 bit matrix

"Hacker's Delight", Chapter 7-3

This procedure treats the 8×8-bit matrix as 16 2×2-bit matrices and transposes each of the 16 2×2-bit matrices. The matrix is then treated as four 2×2 sub-matrices whose elements are 2×2-bit matrices and each of the four 2×2 sub-matrices are transposed. Finally, the matrix is treated as a 2×2 matrix whose elements are 4×4-bit matrices and the 2×2 matrix is transposed.

unsigned long long x;
x = x & 0xAA55AA55AA55AA55LL |
  (x & 0x00AA00AA00AA00AALL) << 7 |
  (x >> 7) & 0x00AA00AA00AA00AALL;
x = x & 0xCCCC3333CCCC3333LL |
  (x & 0x0000CCCC0000CCCCLL) << 14 |
  (x >> 14) & 0x0000CCCC0000CCCCLL;
x = x & 0xF0F0F0F00F0F0F0FLL |
  (x & 0x00000000F0F0F0F0LL) << 28 |
  (x >> 28) & 0x00000000F0F0F0F0LL;


Geohash in Go assembly


Find nth set bit



Efficient integer square root algorithm



Constant-time bits

Go version 1.13 guarantees execution time of Add, Sub, Mul, RotateLeft, and ReverseBytes in package math/bits to be independent of the inputs.

CL 170758:

// Variable time
func Add64(x, y, carry uint64) (sum, carryOut uint64) {
  yc := y + carry
  sum = x + yc
  if sum < x || yc < y {
    carryOut = 1
// Constant time
func Add64(x, y, carry uint64) (sum, carryOut uint64) {
  sum = x + y + carry
  carryOut = ((x & y) | ((x | y) &^ sum)) >> 63

Go crypto/subtle

Package subtle implements functions that are often useful in cryptographic code but require careful thought to use correctly such as constant-time comparisons or copies.

func ConstantTimeByteEq(x, y uint8) int
func ConstantTimeCompare(x, y []byte) int
func ConstantTimeCopy(v int, x, y []byte)
func ConstantTimeEq(x, y int32) int
func ConstantTimeLessOrEq(x, y int) int
func ConstantTimeSelect(v, x, y int) int


Go regression testing

Nearly every fixed bug or issue has an associated test created in the test/fixedbugs directory to prevent regressions. Each test is tagged with a comment on the first line indicating the mode of testing: run, compile, errorcheck, or build. If a test has an associated directory, it becomes rundir, compiledir, etc. The level of automation and thorough nature of these tests is impressive.

Go objdump

objdump disassembles executable files in Go's Plan 9 assembly syntax.


Go context concurrency pattern


Communicating sequential processes


Less is exponentially more


Retrospective on early Go development


Go interface implementation


Go design philosophy


"Go at Google: Language Design in the Service of Software Engineering"

Go grammar is mostly regular:

Compared to other languages in the C family, its grammar is modest in size, with only 25 keywords (C99 has 37; C++11 has 84; the numbers continue to grow). More important, the grammar is regular and therefore easy to parse (mostly; there are a couple of quirks we might have fixed but didn't discover early enough). Unlike C and Java and especially C++, Go can be parsed without type information or a symbol table; there is no type-specific context.

TODO: Expand on CSP and arena allocator.


Go 1.13

Selected features to be added in Go 1.13:

  • More number literal prefixes are supported.
  • The restriction that a shift count must be unsigned is removed.
  • math/bits: The execution time of Add, Sub, Mul, RotateLeft, and ReverseBytes is now guaranteed to be independent of the inputs.


Semantics of unary plus and minus


Go proposal for 128 bit integers


32-bit implementation of 64-bit division:

Go runtime errors

Runtime errors are distinguished from ordinary errors by the no-op function RuntimeError() in the runtime.Error interface.

type Error interface {

Google monorepos


Working on the Google Go team



WebAssembly compiled languages list


Go experimental subpackages

utf8string provides an efficient way to index strings by rune rather than by byte.

apidiff determines whether two versions of the same package are compatible.

sumdb/gosumcheck checks a go.sum file against a go.sum database server.

shiny/materialdesign provides named colors and icons specified by Material Design.

shiny/screen and shiny/driver provide interfaces and drivers for accessing a screen.

Red-black trees in functional languages

Chris Okasaki demonstrated that red-black trees can be efficiently and elegantly implemented in functional languages. He simplifies insert to have four unbalanced cases and one balanced case.

Red-black trees have become one of the most common persistent data structures in functional languages.–black_tree

Go reusable containers

The container package in the Go standard library provides a few reusable containers including a circular linked list, a doubly linked list, and a heap interface and functions to operate on the heap.

2019-08 (undated)

Git repository merging



Go hex dump

Dumper in encoding/hex writes a hex dump in the format of hexdump -C.

However, a potential bug is that it does not output the offset of the final byte like hexdump -C, so it does not match precisely.


Go test helpers

t.Helper() can be called to mark the caller as a test helper function and skips printing file and line information for that function.

encoding/asci85 has a clever Errorf and comparison wrapper:

testEqual(t, "Encode(%q) = %q, want %q", p.decoded, strip85(string(buf)), strip85(p.encoded))

func testEqual(t *testing.T, msg string, args ...interface{}) bool {
  if args[len(args)-2] != args[len(args)-1] {
    t.Errorf(msg, args...)
    return false
  return true

Go binary.Varint

Unsigned integers are serialized 7 bytes at a time, starting with the least significant bits. The most significant bit indicates if there is a continuation byte.

Go bounds check elimination

Issue #14808 provides a list of bound check eliminations used or not used by Go compiler including the following:

var a[]intuse a[0], a[1], a[2] // three bound checks
// can be improved as
_ = a[3]             // early bounds check
use a[0], a[1], a[3] // no bound checks
// or
a = a[:3:len(a)]     // early bound check
use a[0], a[1], a[3] // no bound checks
// or
use a[3], a[2], a[1] // one bound check

Bounds check hints in the wild in binary.LittleEndian and binary.BigEndian:

func (littleEndian) Uint16(b []byte) uint16 {
  _ = b[1] // bounds check hint to compiler; see
  return uint16(b[0]) | uint16(b[1])<<8
func (littleEndian) PutUint16(b []byte, v uint16) {
  _ = b[1] // early bounds check to guarantee safety of writes below
  b[0] = byte(v)
  b[1] = byte(v >> 8)

Grid processing algorithms

Matrices using image processing algorithms to group coordinates in grid:

A max-heap can be used to order rectangles in grid by size:


Go json.RawMessage

RawMessage is a raw encoded JSON value implementing Marshaler and Unmarshaler used to delay decoding or precompute an encoding.

In the wild:

type clientResponse struct {
  Id     uint64           `json:"id"`
  Result *json.RawMessage `json:"result"`
  Error  interface{}      `json:"error"`

Example from json docs:

// use a precomputed JSON during marshal
h := json.RawMessage(`{"precomputed": true}`)
c := struct {
  Header *json.RawMessage `json:"header"`
  Body   string           `json:"body"`
}{Header: &h, Body: "Hello Gophers!"}
// delay parsing part of a JSON message
type Color struct {
  Space string
  Point json.RawMessage // delay parsing until we know the color space
type RGB struct {
  R, G, B uint8
var c Color
err := json.Unmarshal(`{"Space": "RGB", "Point": {"R": 98, "G": 218, "B": 255}}`, &c)
var dst interface{}
switch c.Space {
  case "RGB":
  dst = new(RGB)
err = json.Unmarshal(c.Point, dst)


Aesthetic new tab

Project idea: develop a web browser extension to replace the new tab page with a more aesthetically pleasing page including a QLOCKTWO-style clock and artistic backgrounds. The search bar is redundant with the address bar and need not be included. Depending on the level of minimalism desired, feeds of the user's favorite websites could be displayed.


Apollo mission streams



Radix 2^51 trick for fast addition and subtraction

Adding or subtracting 256-bit numbers requires carries for each 64-bit limb and thus is slow because the instructions cannot be parallelized. If 256-bit numbers are instead split into 51-bit limbs stored in 64-bit registers, then each limb can be added in parallel with the carries propagated later. This allows 2^13 limbs to be added before the high 13 bits overflow.

For subtraction, the carries are negative, so the limbs can be treated as signed instead of unsigned. This reserves the most significant bit for the sign which reduces the number of operations that can be performed between normalizations to 2^12.

|            [--------------------- 52 bits --------------------]|
|             [-------------------- 51 bits --------------------]|
|             [-------------------- 51 bits --------------------]|
|             [-------------------- 51 bits --------------------]|
|             [-------------------- 51 bits --------------------]|

Constant-time cryptography

When writing constant-time code, timing should not depend on secret information.

Secret information may only be used in an input to an instruction if that input has no impact on what resources will be used and for how long.


Today's languages and compilers weren't really built for this, so it's a challenge.


The compiler might decide that your code would be faster if it used variable-time instructions. There are even cases where an optimizing compiler will see that you are trying to, say, avoid using an if statement, and the compiler puts the if statement back in because it knows it will be faster.

Code can be verified to be constant-time using a patch to Valgrind made by Adam Langley:

BigInt in ES2020

  • BigInt is a numeric primitive for arbitrary precision integers introduced in ES2020:
  • BigInt has its own type and can be defined with a n suffix (typeof 42n === 'bigint').
  • A BigInt is not strictly equal to a Number (===), but is abstractly equal (==).
  • When coerced to a boolean, BigInt follows the same logic as Number.
  • Binary +, -, *, and ** all work. / and % work, rounding towards zero.
  • Bitwise operations |, &, <<, >>, and ^ assume a two's complement representation for negative values.
  • Unary - negates, though unary + is not supported because asm.js expects +x to produce either a Number or an exception.
  • Unsigned right shift >>> is unsupported because BigInt is always signed.
  • BigInt64Array and BigUint64Array, make it easier to efficiently represent lists of 64-bit signed and unsigned integers.

Detecting signals in Go

src-d/go-git intercepts signals to exit cleanly from Git calls using os/signal and context.

c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)



go-intervals is a library for performing set operations on 1-dimensional intervals, such as time ranges:

pprof profiler

pprof is a code profiler, primarily for C and C++.

Go integration with pprof is enabled in the runtime/pprof package. Russ Cox details the process of building support in Go for pprof.

Stellarium source

The digital planetarium software Stellarium is open source:

Tabletop Whale

Tabletop Whale is a blog of open source charts and graphics visualizing large science datasets.

"The Western Constellations", a map of every star seen from Earth:

"An Orbit Map of the Solar system", including every object over 10km diameter:


Uber Go libraries



LLIR is an unofficial library for interacting with LLVM IR in pure Go.

Several projects use LLIR including a research project to decompile assembly into Go using LLVM IR and a transpiler to Bash:


GNU Multiple Precision Arithmetic Library


Bit twiddling reference



Interstellar film script differences

IMSDb has a draft script from March 12, 2008 for Interstellar that drastically different from the final film version. In it, Murph is a boy and the Chinese passed through the wormhole long before NASA and figured out how to manipulate gravity.

Project idea: make a tool to format film scripts to be more pleasant to read. Scripts on IMSDb are consistently formatted, albeit with some inaccuracies, so could be mapped to another format.


Send channels and receive channels in Go

There are three types of channels: bidirectional chan, receive-only <-chan, and send-only chan<-. A bidirectional channel can be casted to either receive-only or send-only, but cannot be converted back.

First seen in the Go syntax definitions in GitHub's Semantic project:


Elm functional language

Elm is a pure functional UI design DSL with strong static type checking and "no runtime exceptions in practice".

XMLisp - Lisp with XML syntax

Project idea: implement a Lisp-like language with implicit returns, higher order functions, and expressions as values. This would be a more capable successor to XMLang that would introduce type safety and would parse with encoding/xml rather than JSX.

<func name="fib" params="n int">
    <eq>n 0</eq>
      <eq>n 1</eq>
        <fib><sub>n 1</sub></fib>
        <fib><sub>n 2</sub></fib>

Go crypto/rand

crypto/rand operates on *big.Int, unlike math/rand.

rand.Reader is a global, shared instance of a cryptographically secure random number generator that reads from OS-specific APIs.


Ahead-of-time and just-in-time compilation


μ-recursive function


Whitespace is supposedly based on μ-recursive functions:

HaPyLi programming language

HaPyLi is a programming language designed to compile to Whitespace, with syntax derived from Haskell, Python, and Lisp. HaPyLi uses the Whitespace heap to store strings and globals. It supports inline Whitespace, but requires that all arguments and local variables be popped and exactly one value be pushed. The standard library includes alloc, similar to malloc in C, but there is no corresponding free implementation.

import "stdlib/base.hpl"

def power(x y) =
    (if (== y 1)
        (* x (power x (- y 1))))

def main() = (print-number (power 2 10))

Unfortunately, as the homepage is defunct, the compiler source is no longer available.

Marinus Oosters created a 99 bottles of beer program written in HaPyLi:

While developing HaPyLi, the author posted a question on Haskell monads during code generation:


Thue esolang



Astro programming language



Go math/bits proposal

All bit twiddling functions, except popcnt, are already implemented by runtime/internal/sys and receive special support from the compiler in order to "to help get the very best performance". However, the compiler support is limited to the runtime package and other Golang users have to reimplement the slower variant of these functions.

Go 1.13 signed shift counts

In Go 1.13, shift counts (<< and >>) are no longer required to be unsigned and when negative, a panic occurs.

This requires an estimated minimum of two extra instructions per non-constant shift: a test and a branch to be check at run-time, as done for make. The compiler can omit the check for unsigned and constant values and when it is able to prove that the operand is non-negative.

As a last resort, an explicit uint conversion or mask in the source code will allow programmers to force the removal of the check, just as an explicit mask of the shift count today avoids the oversize shiftcheck.


Assembly performing slower than high level programs

Go blank identifier uses

Disable unused declaration error:

_ = unused

To import a package solely for its side-effects (initialization), use the blank identifier as explicit package name:

import _ "lib/math"

Static type assertion:

type T struct{}
var _ I = T{}       // Verify that T implements I.
var _ I = (*T)(nil) // Verify that *T implements I.

Interspersing delimiters without branching

The Go Programming Language:

var s, sep string
for i := 1; i < len(os.Args); i++ {
  s += sep + os.Args[i]
  sep = " "

Semantic source code library by GitHub

Yorick programming language

"Yorick is an interpreted programming language for scientific simulations or calculations, postprocessing or steering large simulation codes, interactive scientific graphics, and reading, writing, or translating large files of numbers."

"Arrays are first-class objects that can be operated on with a single operation. Since the virtual machine understands arrays, it can apply optimized compiled subroutines to array operations, eliminating the speed penalty of the interpreter."

"Yorick is good at manipulating elements in N-dimensional arrays conveniently with its powerful syntax."

I was referred to Yorick by Matt Borthwick as it is his favorite programming language for physics. A trick to compute the product of the elements of an array and avoid overflows is to take the exponentiation of the sum of the natural logs of the elements: exp(sum(ln(arr))).


Go unnamed method receiver

A method can have a receiver without a name.

func (CmdReceivePack) Usage() string


Go embedded struct fields

Embedded struct fields have no name and promote fields and methods to another struct.

Discovered in encoding/json/encode.go:

type jsonError struct{ error }
type A struct {
  foo int
type B struct {
  bar int
b := B{A{10}, 3}
//,, b.A are accessible


Git branch cleanup tool

Project idea: after a PR has been merged and the branch deleted on the remote, any local clones of this branch remain and should be deleted.


  • Scan repo(s) for all branches
  • Exclude dev, master, currently checked out branch, and branches with an open PR
  • Delete all merged branches


Go sync.Pool


Password crossword

The 2013 Adobe breach has the credentials of millions of users and the passwords are encrypted insecurely using Triple DES.

In an XKCD comic, Randall Munroe creates a crossword puzzle of solving the password blocks using the given password hints.

Using statistics of the most common passwords, one could provide a word bank common passwords that could be used while solving such a crossword puzzle.

Go reflection to view and set unexported fields

Reflection is designed to allow any field to be accessed, but outside of the definition package, only exported fields can be modified (The Go Programming Language, Donovan and Kernighan).

However, using unsafe.Pointer and (*reflect.Value).UnsafeAddr, unexported values can be assigned to, though doing so potentially interferes with garbage collection.

Deleting value in Go map

An element can be deleted from a map using delete(m, key) similar to delete m[key] in Javascript.

Go modules synchronizer

When working in a project using modules, each package and sub-package requires specific dependency versions which protects a package from breaking changes in its dependencies, but it makes changing code in multiple packages simultaneously difficult.

Project idea: a tool that watches locally changed packages and updates interdependencies would greatly simplify this process.


Go runtime package

Operations to interact with runtime system and low-level reflect type information.

Discovered from attempt to print line numbers in error messages.


Text normalization

  • Text normalization in Go:
  • Detailing on NFC, NFD, NFKC, and NFKD methods of transforming text:
  • Normalizing to NFC compacts text, giving substantial savings to languages like Korean
    • Project idea: Make a keyboard with Hangul input that converts to NFC as you type, but allows for deletion by character rather than by block
  • Allows for normalization of look-alikes


Goto in Python

In Python, an April Fools joke added goto, label, and comefrom.

Discovered from a comparison of throw to comefrom in favor of Go's error handling decisions.

Git annotations for ls

Project idea: annotate ls command with Git branch and status information.

There are answers here, but they don't appear to be efficient:

Note the branch annotations at the end:

drwxr-xr-x 1 0018121 Domain Users    0 Dec 14 14:33 MyProject/ (develop)
drwxr-xr-x 1 0018121 Domain Users    0 Dec 14 14:17 Data/
drwxr-xr-x 1 0018121 Domain Users    0 Dec 14 12:08 MyApp/ (master)
-rw-r--r-- 1 0018121 Domain Users 399K Aug  4 10:41 readme.txt

Git checkout shortcut

Project idea: scan for directories containing a Git repo and create Bash aliases for each branch.

See entry from 2019-05-24 for solution using Git shell completions.


Assembly in Go

Guide to the assembly used in Go:

The assembler's parser treats period and slash as punctuation, so the assembler allows the middle dot character U+00B7 and the division slash U+2215 in identifiers and rewrites them to plain period and slash.

For example, fmt.Printf and math/rand.Int are rewritten to fmt·Printf and math∕rand·Int.


RE2 regular expression engine



Code Search

Google Code Search performs regular expression matching with a trigram index.

Binary RegExp


Continuation-passing style

Functions in CPS take an extra argument, the continuation, a function of one argument. The result is returned by calling the continuation function with this value.

Procedure returns are calls to a continuation, intermediate values are all given names, argument evaluation order is made explicit, and tail calls call a procedure with the same continuation.

Functional and logic compilers often use CPS as an intermediate representation, whereas imperative or procedural compilers would use static single assignment form (SSA).

; Direct style
(define (pyth x y)
 (sqrt (+ (* x x) (* y y))))

; Continuation-passing style
(define (pyth& x y k)
 (*& x x (lambda (x2)
          (*& y y (lambda (y2)
                   (+& x2 y2 (lambda (x2py2)
                              (sqrt& x2py2 k))))))))

Go math/bits package

Arithmetic functions: add/sub with carry and mul/div with remainder. Bit manipulation: leading/trailing zeros count, bit count, one count, reverse, and rotate.


Git autocomplete

Git autocompletion can be installed by downloading the following file and sourcing in profile.

curl -o ~/.git-completion.bash



Approxidate is the date parser in Git and it supports many date formats and was originally written by Linus:

Approxidate has been converted to a C library and wrapped for Go:

Git integration with Go

go-git is a highly extensible Git implementation library written in pure Go.


Datalog Disassembly

Datalog Disassembly is a disassembler by GrammaTech using the Datalog language.

A fast disassembler which is accurate enough for the resulting assembly code to be reassembled. The disassembler implemented using the datalog (souffle) declarative logic programming language to compile disassembly rules and heuristics. The disassembler first parses ELF file information and decodes a superset of possible instructions to create an initial set of datalog facts. These facts are analyzed to identify code location, symbolization, and function boundaries. The results of this analysis, a refined set of datalog facts, are then translated to the GTIRB intermediate representation for binary analysis and reverse engineering. The GTIRB pretty printer may then be used to pretty print the GTIRB to reassemblable assembly code.

The analysis contains two parts:

  • The C++ files take care of reading an elf file and generating facts that represent all the information contained in the binary.

  • src/datalog/*.dl contains the specification of the analyses in datalog. It takes the basic facts and computes likely EAs, chunks of code, etc. The results are represented in GTIRB or can be printed to assembler code using the gtirb-pprinter.

Project idea: determine how my Datalog interpreter performs in comparison to industry implementations like Soufflé.


Static single assignment form

In compiler design, SSA is a property of an intermediate representation, which requires that each variable is assigned exactly once, and every variable is defined before it is used.

Variables in the are split into versions so that every definition gets its own version.

y := 1
y := 2
x := y

Rewritten in SSA:

y₁ := 1
y₂ := 2
x₁ := y₂

Go loop bound reevaluation

Loop bound expressions can be optimized in some cases to be evaluated once:

Go 1.7 switched to using SSA for the compiler which generates more compact, more efficient code and provides a better platform for optimizations such as bounds check elimination.


Unicode property trie lookup


2019-05 (undated)

W3C CSS Color Module Level 4 changes


EDI Parsing




Powerline formats the shell prompt and vim status line into great looking segments. It uses patched fonts like FiraCode to render custom Unicode glyphs. Powerline Gitstatus is a segment for showing the status of a Git working copy.


Signal release notes

The release notes for version 2.38.1 of the Signal encrypted messenger contain a humorous bug fix:

Users on iOS 9 will no longer where the input toolbar wanders off to for a few moments every time they end a captioned attachment. What will they do with the time that they save? Hopefully upgrade to iOS 10.


AsciiDots and Ook! esoteric languages

AsciiDots executes using dots travelling along ascii art paths taking inspiration from electrical engineering.

Ook! is a simple mapping of Brainfuck instructions to trinary combinations of Ook., Ook?, and Ook!.!




TrumpScript is a joke programming language developed for a hackathon. Its slogan is "Make Python great again".

Features include:

  • No floating point numbers, only integers. America never does anything halfway.
  • All numbers must be strictly greater than 1 million. The small stuff is inconsequential to us.
  • There are no import statements allowed. All code has to be home-grown and American made.
  • Instead of True and False, we have the keywords fact and lie.
  • Only the most popular English words, Trump's favorite words, and current politician names can be used as variable names.
  • Error messages are mostly quotes directly taken from Trump himself.
  • All programs must end with America is great.
  • Our language will automatically correct Forbes' $4.5B to $10B.
  • In its raw form, TrumpScript is not compatible with Windows, because Trump isn't the type of guy to believe in PC.
  • TrumpScript boycotts OS X and all Apple products until such time as Apple gives cellphone info to authorities regarding radical Islamic terrorist couple from Cal.
  • The language is completely case insensitive.
  • If the running computer is from China, TrumpScript will not compile. We don't want them stealing our American technological secrets.
  • By constructing a wall (providing the -Wall flag), TrumpScript will refuse to run on machines with Mexican locales
  • Warns you if you have any Communists masquerading as legitimate "SSL Certificates" from China on your system.
  • Won't run in root mode because America doesn't need your help being great. Trump is all we need.
  • Easy to type with small hands



Customizing Windows command prompt



GHIDRA - NSA reverse engineering and decompilation tool



StackBlitz online IDE and Turbo package manager

StackBlitz is an online IDE for web development powered using Visual Studio Code. It features live reloading, package management, deployment to Google Cloud, and GitHub integration.

StackBlitz uses a custom JavaScript package manager, Turbo, that runs entirely in the browser and retrieves only the files you need on demand. It installs packages about five times faster than Yarn or NPM, reduces the size of node_modules by up to two two orders of magnitude, and pulls from multiple redundant CDNs. Files not directly required by the main field are lazy loaded.

Peer dependencies can be easily installed through the IDE.



Flix is a functional programming language that includes first-class Datalog predicates. Flix synthesizes features from ML-style languages, logic languages, and Go-like concurrency.

  • algebraic data types
  • pattern matching
  • first-class functions
  • extensible records
  • parametric polymorphism
  • Hindley-Milner type inference
  • CSP-style concurrency
  • buffered & unbuffered channels
  • first-class datalog constraints
  • polymorphic datalog predicates
  • stratified negation
  • unboxed primitives
  • expressions holes
  • full tail call elimination
  • compilation to JVM bytecode
  • core standard library
  • human friendly errors
  • interactive mode




QLOCKTWO text-based clock

In Zürich, Switzerland, I saw a store selling text-based clocks made by QLOCK2. The clocks have a grid of letters that light up to spell out the time.

For example, in 5:28 would be rounded to 5:30 and displayed in German as "ES IST HALB FÜNF".


In English, it would be "IT IS HALF PAST FIVE".



Trie data structure


"Efficient and scalable trie-based algorithms for computing set containment relations"



Unlambda is a minimal functional language based on combinatory logic. It is the first functional Turing tarpit.



C compile-time assertions

The Linux kernel uses macros as compile-time assertions to break the build when conditions are true. The trick leverages bitfields with negative widths to produce compilation errors. This contrasts with assert which is a runtime test.

#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
#define BUILD_BUG_ON_NULL(e) ((void *)sizeof(struct { int:-!!(e); }))






Peter Aldous' project.


Apollo transcendental function computation



Countering Trusting Trust

The following was originally written as email on 2019-09-25 to Dr. Peter Aldous following up with conversation from 2019-09-24:

I did some more research into Reflections on Trusting Trust by Thompson and came across this dissertation "Fully Countering Trusting Trust through Diverse Double-Compiling" by David A. Wheeler that counters such an attack.

Essentially, he proves that a questionable compiler can be verified by comparing a trusted compiler, compiled from source, then compiled by itself with the result of a questionable compiler executable compiling that first trusted compiler's source twice.

Below is a summary of his approach and a link to a more complete summary. Wheeler's site explains some details that the summary glosses over, but the dissertation itself if 199 pages long, so I haven't read that.

Suppose we have two completely independent compilers: A and T. More specifically, we have source code SA of compiler A, and executable code EA and ET. We want to determine if the binary of compiler A - EA - contains this trusting trust attack.

Here's Wheeler's trick:

  • Step 1: Compile SA with EA, yielding new executable X.
  • Step 2: Compile SA with ET, yielding new executable Y.

Since X and Y were generated by two different compilers, they should have different binary code but be functionally equivalent. So far, so good.


  • Step 3: Compile SA with X, yielding new executable V.
  • Step 4: Compile SA with Y, yielding new executable W.

Since X and Y are functionally equivalent, V and W should be bit-for-bit equivalent.

As linked from Wheeler's paper, University of Michigan researchers discovered and implemented a hardware backdoor that can be installed by a single employee at the processor's fabrication facility and triggered by a sequence of obscure commands that charge a capacitor, then eventually trigger and grant OS access. This is a scary prospect because of how monumentally difficult it would be to detect such a backdoor.

In the explain xkcd wiki, Wheeler's paper is mentioned:


Reflections on Trusting Trust

Ken Thompson demonstrates in "Reflections on Trusting Trust" that we can't fully trust any software we did not write ourselves, including compilers.

He describes a scenario in which the C compiler could install a backdoor into the Unix login command without being detected. First, the C compiler source would be patched with the vulnerability, then built and distributed as a binarys. If the bugged compiler compiles Unix, it will identify and install this backdoor. Additionally, It recognizes when it compiles itself such that it plants the backdoor in future compiler versions. As the exploit exists only in the executable, it is undetectable while looking at the source.

XKCD comic "Old Days" mentions "Reflections on Trusting Trust" in the title text.


Windows 95 on an Apple Watch

Nick Lee installed Windows 95 on an Apple Watch, but as it is emulated, it takes around an hour to boot.


Webkit color implementation



Fira font for Firefox OS

Fira is a typeface designed by Mozilla for Firefox OS and includes many weights and monospaced variants. It is the base for FiraCode.


FiraCode font with ligatures

FiraCode is a monospaced font with ligatures for common programming symbols. Ligatures include arrows, equalities, increment/decrement, hexadecimal positioning, and other operators.


JSX usage outside of React

JSX can be used outside of React as Robert Prehn outlines in his article.

You could use this to do anything that lends itself to functional composition. Some ideas:

  • You could create DSLs with an XML-like syntax within your JavaScript.
  • You could abuse the JSX transform to compile XML configuration or seed data into JavaScript objects (please don't).
  • You could even make a whole XML-syntax functional programming language that compiles to JS (just stop).

I developed a simple language named XMLang based on this idea:


Using C preprocessor language agnostically

The C preprocessor is independent of the C language and thus can be used with languages that do not have compile-time evaluation.

-E  Stop after the preprocessing stage; do not run the compiler
    proper. The output is in the form of preprocessed source code,
    which is sent to the standard output.

-P  Inhibit generation of linemarkers in the output from the
    preprocessor. This might be useful when running the preprocessor
    on something that is not C code, and will be sent to a program
    which might be confused by the linemarkers.

The source file must be named with a .c suffix and the output can be redirected to a file.

gcc -E -P hello.js.c > hello.js
#define HOWDY

function hello(name) {
  console.log('Hello, ' + name);
#ifdef HOWDY
  console.log('Howdy, ' + name + '!');


Apollo Guidance Computer source



Implementing arbitrary precision integers

Project idea: arbitrary precision integers could be stored using contiguous integers with carry and borrow used to implement addition and subtraction across boundaries. Multiplication and division would be more difficult.

This idea was sparked by something in a CS 224 Computer Systems lecture.


Social security numbers insecure



ArnoldC and LOLCODE quote-based esolangs



Whitespace programming language



Piet graphical esolang









youtube-dl video downloader

youtube-dl is a highly configurable command-line program written in Python to download videos from and other video sites.


BitShift esoteric language

BitShift is an esolang with only two valid characters: 0 and 1. Instructions are denoted by the count of alternating 0 and 1 characters and are delimited by a matching pair (00 or 11).

Alternations Description
1 Shift the value 1 bit to the left (0000 0001 > 0000 0010)
2 Shift the value 1 bit to the right (0000 0010 > 0000 0001)
3 XOR the value with 1 (0000 0000 > 0000 0001)
4 XOR the value with 128 (0000 0000 > 1000 0000)
5 Set the value to 0
6 Convert the value to a character and print it
7 Read a character from user input and set the value to it



Pi series

The sum of inverse squares is pi squared over six:

\frac{\pi^2}{6} = \sum_{n=1}^{\infty} \frac{1}{n^2}

The arctangent power series of 1 is equal pi/4:

\arctan(x) = 1 - \frac{x^3}{3} + \frac{x^5}{5} - \frac{x^7}{7} + ...
\arctan(1) = \frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + ...


Initial Go commits

The first four commits in the Go repository reference the evolution of C, a callback to when Rob Pike worked with Brian Kernighan in the 1980s at Bell Labs.

convert to Draft-Proposed ANSI C, Brian Kernighan committed on Apr 1, 1988

#include <stdio.h>

    printf("hello, world\n");

convert to C, Brian Kernighan committed on Jan 19, 1974

main() {
    printf("hello, world");

hello, world, Brian Kernighan committed on Jul 18, 1972

main( ) {
    extrn a, b, c;
    putchar(a); putchar(b); putchar(c); putchar('!*n');
a 'hell';
b 'o, w';
c 'orld';

Navigating history on GitHub to earliest commit:


Shamir's Secret Sharing

Shamir's Secret Sharing is an algorithm in cryptography created by Adi Shamir. It is a form of secret sharing, where a secret is divided into parts, giving each participant its own unique part.

To reconstruct the original secret, a minimum number of parts is required. In the threshold scheme this number is less than the total number of parts. Otherwise all participants are needed to reconstruct the original secret.


The essential idea of Adi Shamir's threshold scheme is that 2 points are sufficient to define a line, 3 points are sufficient to define a parabola, 4 points to define a cubic curve and so forth. That is, it takes k points to define a polynomial of degree k-1.


Logo graphical programming language


Douglas Clements and Dominic Gullo in "Effects of Computer Programming on Young Children's Cognition"

The study utilized the Logo programming language that was designed to be intuitive for children. Through a 12 week period, six year old children learned Logo programming during two sessions a week. Students used Logo to draw graphics by directing the movements of a turtle. As the lessons progressed, the difficulty of the tasks increased and, by the end, they could use the full Logo language. Those students were compared with a control group that only participated in computer assisted instruction and the Logo programming group significantly improved in fluency, originality, divergent thinking, direction giving, and had less errors.


Legacy HTML color parsing

Obsolete HTML color attributes parse colors differently and strings such as 'chucknorris' are rendered as '#c00000'. This behavior was inherited from Netscape and has persisted in modern browsers for compatibility. The HTML spec describes this algorithm.

The code Netscape Classic used for parsing color strings is open source:

Colors for arbitrary strings can be previewed using a tool by Tim Pietrusky:

I extended a fork of TinyColor to parse legacy colors:




GitHub calendar art




RoboKitty is the codebase for an interactive robotic kitty using Arduino. It includes two modules - 8-bit tunes which plays ABC notation music stored on an SD card and battery monitor which displays charge in increments of 10% on LEDs.


Flexbox Froggy

Flexbox Froggy is a game for learning CSS flexbox. It teaches flexbox by positioning frogs on lilypads and beating levels.


Code poetry

Stanford holds yearly code poetry slams in which students present poems written in any programming language. These programs are poetic when read, yet still executable to a computer. Zak Kain wrote "Capsized" for Code Poetry Slam 1.1 using CSS properties.

.ocean {
    color: cornflowerblue;
    pitch: high;
    overflow: visible;

.boat {
    color: firebrick;
    transform: rotate(94deg);
    float: none;

.rescue-team {
    visibility: visible;

.crew {
    widows: none;

The book "code {poems}" is selection of 55 poems compiled by Ishac Bertran. Contained within is the short C++ poem "Unhandled Love" by Daniel Bezerra.

class love {};

void main()
    throw love();


ZeroSprites CSS sprite generator

ZeroSprites is a CSS sprites generator aimed at area minimization using algorithms used in the field of VLSI floorplanning.


Font Awesome

Font Awesome is a free vector icon font for web development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.