coolbutuseless/wasm.Rmd

## wasm.Rmd
---
title: "Toy WASM interpreter in base R"
author: "mikefc"
date: 2022-07-29T18:05:00+10:00
categories: ['Rstats']
tags: ["R"]
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  # fig.path = "/img/visvalingam/README-",
  out.width = "100%"
)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Ensure that images are rendered using a device which understands patterns
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
knitr::opts_chunk$set(dev.args = list(png = list(type = "cairo")))

library(eventloop)
library(nara)
```


## WASM

One of the big buzzes at #RStudioConf2022 was the whiff of serverless "Shiny"
in both python and R by harnessing [WASM](https://en.wikipedia.org/wiki/WebAssembly).

WASM (a.k.a. WebAssembly) is a portable binary-code format initially targetted
at running code in a web browser at near-native speeds.

This post is a small exploration into what a WASM binary actually looks like
and how it is structured.

I also create a toy interpreter for WASM in R.  Note: You would never do this
in real-life - it's just for a bit of fun (maybe because I had so much fun with the
virtual machine interpreter for [AnotherWorld](http://localhost:4321/2022/07/29/anotherworld-game-written-playable-in-r-with-nara-and-eventloop/))


## C function to calculate `factorial(n)`

This is a classic bit of C code for calculating `n!` i.e. `n * (n-1) * (n-2) * ... * 1`

```{c eval=FALSE}
int factorial(int n) {
  if (n == 0)
    return 1;
  else
    return n * factorial(n-1);
}
```

## `factorial(n)` in WAT (human readable WASM text)

There is a human-readable version of WASM called "WAT" which has a structure
similar to S-expressions.

The above C code would be compiled to WASM/WAT as follows:

```{wat}
(func (param i64) (result i64)
  local.get 0             # put arg[0] on stack
  i64.eqz                 # compare top of stack to zero
  if (result i64)         # if it is zero
      i64.const 1         # put 1 on stack
  else
      local.get 0         # put arg[0] on stack
      local.get 0         # put arg[0] on stack
      i64.const 1         # put 1 on stack
      i64.sub             # subtract the top 2 values in stack
      call 0              # Call function #0 (return value is on the stack)
      i64.mul             # Multiply
  end)'
```

## Compile WAT to WASM

The [WebAssembly Binary Toolkit (WABT)](https://github.com/WebAssembly/wabt)
can be used to compile the WAT file into a binary WASM file

```{sh eval=FALSE}
brew install wabt
wat2wasm helloworld.wat   # Creates 'helloworld.wasm'
```

```
# helloworld.wasm
00 61 73 6d 01 00 00 00 01 06 01 60 01 7e 01 7e 03 02 01 00 0a 17 01 15 00 20 00 50 04 7e 42 01 05 20 00 20 00 42 01 7d 10 00 7e 0b 0b
```


## WASM raw bytes

With some googling and using the reference list of [WASM bytecodes](https://webassembly.github.io/spec/core/appendix/index-instructions.html)
it is possible to deconstruct and understand the bytes in the `*.wasm` file for
the `factorial(n)` function

```{wasm}
  0x00, 0x61, 0x73, 0x6D, # magick = \0asm
  0x01, 0x00, 0x00, 0x00, # version = 1

  # The 'type' section contains function signatures.
  0x01, 0x06 ,     # section "type". 6 bytes
  0x01,            # Num types = 1 i.e. just a single 'function' type defined here
  0x60,            # Function (begin definition)
  0x01,            # Number of parameters = 1
  0x7e,            #  param types: i64
  0x01,            # Number of results = 1
  0x7e,            #  result types: i64

  # The 'function' section stores indexes of the function signature
  0x03, 0x02,      # section "function". section size = 2 bytes
  0x01,            # Number of functions
  0x00,            # Index of function

  # The 'code' section contains the actual function code
  0x0A, 0x17,      # Section "code".  section size = 0x17 = 23 bytes
  0x01,            # Number of functions = 1
  0x15,            # Function body size = 0x15 = 21 bytes
  0x00,            # Number of local declarations = 0
  0x20, 0x00,      #   local.get 0
  0x50,            #   i64.eqz
  0x04, 0x7E,      #   if i64
  0x42, 0x01,      #     i64.const 1
  0x05,            #   else
  0x20, 0x00,      #     local.get 0
  0x20, 0x00,      #     local get 0
  0x42, 0x01,      #     i64.const 1
  0x7D,            #     i64.sub
  0x10, 0x00,      #     call 0
  0x7E,            #     i64.mul
  0x0B,            #   end
  0x0B             # end
```


## WASM virtual stack machine

WASM bytecode is intended to be run on a virtual stack machine.

A simple way of implementing a stack machine in R is to have a giant switch statement
and a stack implementation.  The `switch` statement chooses what
operations to do based upon the bytecode.

I'm cheating a little bit here and have hand-parsed the code split at the
if-else statement into two blocks.

#### Other cheats:

* you would normally need 64bit integers (i.e. `i64` type in WASM) to interpret the `factorial()` code
  in WASM, but as long as you don't give it a large number it will be fine.
* integers in WASM are encoded as [LEB128 variable length integers](https://en.wikipedia.org/wiki/LEB128)
  but since the code sizes are all small, every integer fits in 7-bits which
  means that no encoding/decoding is necessary.


#### Warnings

* This unoptimized interpreter is going to be terribly slow.


#### Stack

Stack implementations don't come much more naive than this.

The `stack` variable itself is just going to exist in the global environment.

```{r}
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Naive stack
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stack <- c()

# Pop a value of the stack
stack_pop <- function() {
  res   <- stack[1]
  stack <<- stack[-1]
  res
}

# Push a value on the stack
stack_push <- function(val) {
  stack <<- c(val, stack)
}
```

#### Define the functions

The major cheat here is that i've pre-parsed the function into initial code
in the function and the two code blocks that are executed depending on
the `if/else` statement.

There may be multiple functions in general, so `all_funcs` is a list of
all the functions in the code, and the correct function is extracted
during a `call` opcode (i.e. `0x10`)

```{r}
all_funcs <- list(
  func <- list(
    args = c(),
    start = c(
      0x20, 0x00,      #   local.get 0
      0x50,            #   i64.eqz
      0x04, 0x7E       #   if i64
    ),
    if_branch = c(
      0x42, 0x01       #     i64.const 1
    ),
    else_branch = c(
      0x20, 0x00,      #     local.get 0
      0x20, 0x00,      #     local get 0
      0x42, 0x01,      #     i64.const 1
      0x7D,            #     i64.sub
      0x10, 0x00,      #     call 0
      0x7E             #     i64.mul
    )
  )
)
```


```{r}
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#" Recursively interpret the given function
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
interp_wasm <- function(func) {

  # Set the bytecode to be the starting code of the function
  byte_idx <- 1
  bytecode <- func$start

  # Helper function to read 'n' bytes from the bytecode
  read_bytes <- function(n) {
    res <- bytecode[byte_idx + seq(n) - 1]
    byte_idx <<- byte_idx + n
    res
  }

  # For each opcode, determine what to do within a giant `switch` statement
  while(byte_idx <= length(bytecode)) {
    opcode <- read_bytes(1)
    opcode <- sprintf("0x%02x", opcode)
    switch(
      opcode,
      '0x04' = { # if
        type = read_bytes(1) # value type that if() operates on. ignored
        value = stack_pop()  # get the value. treat as boolean.
        if (value == 1) {
          # Jump the bytecode into the respective TRUE/FALSE branch
          bytecode <- func$if_branch
          byte_idx <- 1
        } else {
          bytecode <- func$else_branch
          byte_idx <- 1
        }
      },
      '0x10' = { # call
        # Select the function to call, setup the arguments and call it
        func_idx <- read_bytes(1)
        next_func <- all_funcs[[func_idx + 1]]
        arg <- stack_pop()
        next_func$args <- arg
        if (verbose) message("factorial(", arg, ")")
        interp_wasm(next_func)
      },
      '0x20' = { # local.get idx
        # Push a local variable onto the stack.
        # In this simple case all local vars are just the args to the function
        idx <- read_bytes(1)
        value <- func$args[idx + 1]
        stack_push(value)
      },
      '0x42' = { # i64 const
        # Push a constant onto the stack
        value <- read_bytes(1)
        stack_push(value)
      },
      '0x50' = { # i64.eqz
        # Is value at top of stack == 0?  No/Yes = 0/1
        value <- stack_pop()
        result <- value == 0
        stack_push(result)
      },
      '0x7d' = { # i64.sub
        # Subtract the two values at the top of the stack
        val2 <- stack_pop()
        val1 <- stack_pop()
        result <- val1 - val2
        stack_push(result)
      },
      '0x7e' = { # i64.mul
        # Multiply the 2 values at the top of the stack
        val2 <- stack_pop()
        val1 <- stack_pop()
        result <- val1 * val2
        stack_push(result)
      }
    )

    if (verbose) {
      message(opcode, " -> stack: ", deparse(stack))
    }
  }
  return(stack[1])
}
```

## `factorial(3)` running in WASM

```{r}
# Select the factorial function
func <- all_funcs[[1]]

# Set the argument to be '3'
func$args <- 3

# interpret the code
verbose <- FALSE
interp_wasm(func)
```


## `factorial(3)` running in WASM (verbose)

```{r}
func$args <- 4
verbose <- TRUE
interp_wasm(func)
```


# What's next?

Nothing really.

An actual WASM stack machine would need signed & unsigned 64 bit integers.
You could use [`{bit64}`](https://cran.rstudio.com/web/packages/bit64/index.html) to support this, if you really wanted to.

There are ~250 standard opcodes, and another ~250 SSE opcodes.   I have no
appetite for implementing all these - but it could be fun at some future date.

Please steal/adapt the code if you're interested.
	---
	title: "Toy WASM interpreter in base R"
	author: "mikefc"
	date: 2022-07-29T18:05:00+10:00
	categories: ['Rstats']
	tags: ["R"]
	---

	```{r, include = FALSE}
	knitr::opts_chunk$set(
	collapse = TRUE,
	comment = "#>",
	# fig.path = "/img/visvalingam/README-",
	out.width = "100%"
	)

	#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	# Ensure that images are rendered using a device which understands patterns
	#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	knitr::opts_chunk$set(dev.args = list(png = list(type = "cairo")))

	library(eventloop)
	library(nara)
	```


	## WASM

	One of the big buzzes at #RStudioConf2022 was the whiff of serverless "Shiny"
	in both python and R by harnessing [WASM](https://en.wikipedia.org/wiki/WebAssembly).

	WASM (a.k.a. WebAssembly) is a portable binary-code format initially targetted
	at running code in a web browser at near-native speeds.

	This post is a small exploration into what a WASM binary actually looks like
	and how it is structured.

	I also create a toy interpreter for WASM in R. Note: You would never do this
	in real-life - it's just for a bit of fun (maybe because I had so much fun with the
	virtual machine interpreter for [AnotherWorld](http://localhost:4321/2022/07/29/anotherworld-game-written-playable-in-r-with-nara-and-eventloop/))


	## C function to calculate `factorial(n)`

	This is a classic bit of C code for calculating `n!` i.e. `n * (n-1) * (n-2) * ... * 1`

	```{c eval=FALSE}
	int factorial(int n) {
	if (n == 0)
	return 1;
	else
	return n * factorial(n-1);
	}
	```

	## `factorial(n)` in WAT (human readable WASM text)

	There is a human-readable version of WASM called "WAT" which has a structure
	similar to S-expressions.

	The above C code would be compiled to WASM/WAT as follows:

	```{wat}
	(func (param i64) (result i64)
	local.get 0 # put arg[0] on stack
	i64.eqz # compare top of stack to zero
	if (result i64) # if it is zero
	i64.const 1 # put 1 on stack
	else
	local.get 0 # put arg[0] on stack
	local.get 0 # put arg[0] on stack
	i64.const 1 # put 1 on stack
	i64.sub # subtract the top 2 values in stack
	call 0 # Call function #0 (return value is on the stack)
	i64.mul # Multiply
	end)'
	```

	## Compile WAT to WASM

	The [WebAssembly Binary Toolkit (WABT)](https://github.com/WebAssembly/wabt)
	can be used to compile the WAT file into a binary WASM file

	```{sh eval=FALSE}
	brew install wabt
	wat2wasm helloworld.wat # Creates 'helloworld.wasm'
	```

	```
	# helloworld.wasm
	00 61 73 6d 01 00 00 00 01 06 01 60 01 7e 01 7e 03 02 01 00 0a 17 01 15 00 20 00 50 04 7e 42 01 05 20 00 20 00 42 01 7d 10 00 7e 0b 0b
	```



	## WASM raw bytes

	With some googling and using the reference list of [WASM bytecodes](https://webassembly.github.io/spec/core/appendix/index-instructions.html)
	it is possible to deconstruct and understand the bytes in the `*.wasm` file for
	the `factorial(n)` function

	```{wasm}
	0x00, 0x61, 0x73, 0x6D, # magick = \0asm
	0x01, 0x00, 0x00, 0x00, # version = 1

	# The 'type' section contains function signatures.
	0x01, 0x06 , # section "type". 6 bytes
	0x01, # Num types = 1 i.e. just a single 'function' type defined here
	0x60, # Function (begin definition)
	0x01, # Number of parameters = 1
	0x7e, # param types: i64
	0x01, # Number of results = 1
	0x7e, # result types: i64

	# The 'function' section stores indexes of the function signature
	0x03, 0x02, # section "function". section size = 2 bytes
	0x01, # Number of functions
	0x00, # Index of function

	# The 'code' section contains the actual function code
	0x0A, 0x17, # Section "code". section size = 0x17 = 23 bytes
	0x01, # Number of functions = 1
	0x15, # Function body size = 0x15 = 21 bytes
	0x00, # Number of local declarations = 0
	0x20, 0x00, # local.get 0
	0x50, # i64.eqz
	0x04, 0x7E, # if i64
	0x42, 0x01, # i64.const 1
	0x05, # else
	0x20, 0x00, # local.get 0
	0x20, 0x00, # local get 0
	0x42, 0x01, # i64.const 1
	0x7D, # i64.sub
	0x10, 0x00, # call 0
	0x7E, # i64.mul
	0x0B, # end
	0x0B # end
	```



	## WASM virtual stack machine

	WASM bytecode is intended to be run on a virtual stack machine.

	A simple way of implementing a stack machine in R is to have a giant switch statement
	and a stack implementation. The `switch` statement chooses what
	operations to do based upon the bytecode.

	I'm cheating a little bit here and have hand-parsed the code split at the
	if-else statement into two blocks.

	#### Other cheats:

	* you would normally need 64bit integers (i.e. `i64` type in WASM) to interpret the `factorial()` code
	in WASM, but as long as you don't give it a large number it will be fine.
	* integers in WASM are encoded as [LEB128 variable length integers](https://en.wikipedia.org/wiki/LEB128)
	but since the code sizes are all small, every integer fits in 7-bits which
	means that no encoding/decoding is necessary.


	#### Warnings

	* This unoptimized interpreter is going to be terribly slow.



	#### Stack

	Stack implementations don't come much more naive than this.

	The `stack` variable itself is just going to exist in the global environment.

	```{r}
	#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	# Naive stack
	#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	stack <- c()

	# Pop a value of the stack
	stack_pop <- function() {
	res <- stack[1]
	stack <<- stack[-1]
	res
	}

	# Push a value on the stack
	stack_push <- function(val) {
	stack <<- c(val, stack)
	}
	```

	#### Define the functions

	The major cheat here is that i've pre-parsed the function into initial code
	in the function and the two code blocks that are executed depending on
	the `if/else` statement.

	There may be multiple functions in general, so `all_funcs` is a list of
	all the functions in the code, and the correct function is extracted
	during a `call` opcode (i.e. `0x10`)

	```{r}
	all_funcs <- list(
	func <- list(
	args = c(),
	start = c(
	0x20, 0x00, # local.get 0
	0x50, # i64.eqz
	0x04, 0x7E # if i64
	),
	if_branch = c(
	0x42, 0x01 # i64.const 1
	),
	else_branch = c(
	0x20, 0x00, # local.get 0
	0x20, 0x00, # local get 0
	0x42, 0x01, # i64.const 1
	0x7D, # i64.sub
	0x10, 0x00, # call 0
	0x7E # i64.mul
	)
	)
	)
	```


	```{r}
	#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	#" Recursively interpret the given function
	#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	interp_wasm <- function(func) {

	# Set the bytecode to be the starting code of the function
	byte_idx <- 1
	bytecode <- func$start

	# Helper function to read 'n' bytes from the bytecode
	read_bytes <- function(n) {
	res <- bytecode[byte_idx + seq(n) - 1]
	byte_idx <<- byte_idx + n
	res
	}

	# For each opcode, determine what to do within a giant `switch` statement
	while(byte_idx <= length(bytecode)) {
	opcode <- read_bytes(1)
	opcode <- sprintf("0x%02x", opcode)
	switch(
	opcode,
	'0x04' = { # if
	type = read_bytes(1) # value type that if() operates on. ignored
	value = stack_pop() # get the value. treat as boolean.
	if (value == 1) {
	# Jump the bytecode into the respective TRUE/FALSE branch
	bytecode <- func$if_branch
	byte_idx <- 1
	} else {
	bytecode <- func$else_branch
	byte_idx <- 1
	}
	},
	'0x10' = { # call
	# Select the function to call, setup the arguments and call it
	func_idx <- read_bytes(1)
	next_func <- all_funcs[[func_idx + 1]]
	arg <- stack_pop()
	next_func$args <- arg
	if (verbose) message("factorial(", arg, ")")
	interp_wasm(next_func)
	},
	'0x20' = { # local.get idx
	# Push a local variable onto the stack.
	# In this simple case all local vars are just the args to the function
	idx <- read_bytes(1)
	value <- func$args[idx + 1]
	stack_push(value)
	},
	'0x42' = { # i64 const
	# Push a constant onto the stack
	value <- read_bytes(1)
	stack_push(value)
	},
	'0x50' = { # i64.eqz
	# Is value at top of stack == 0? No/Yes = 0/1
	value <- stack_pop()
	result <- value == 0
	stack_push(result)
	},
	'0x7d' = { # i64.sub
	# Subtract the two values at the top of the stack
	val2 <- stack_pop()
	val1 <- stack_pop()
	result <- val1 - val2
	stack_push(result)
	},
	'0x7e' = { # i64.mul
	# Multiply the 2 values at the top of the stack
	val2 <- stack_pop()
	val1 <- stack_pop()
	result <- val1 * val2
	stack_push(result)
	}
	)

	if (verbose) {
	message(opcode, " -> stack: ", deparse(stack))
	}
	}
	return(stack[1])
	}
	```

	## `factorial(3)` running in WASM

	```{r}
	# Select the factorial function
	func <- all_funcs[[1]]

	# Set the argument to be '3'
	func$args <- 3

	# interpret the code
	verbose <- FALSE
	interp_wasm(func)
	```


	## `factorial(3)` running in WASM (verbose)

	```{r}
	func$args <- 4
	verbose <- TRUE
	interp_wasm(func)
	```



	# What's next?

	Nothing really.

	An actual WASM stack machine would need signed & unsigned 64 bit integers.
	You could use [`{bit64}`](https://cran.rstudio.com/web/packages/bit64/index.html) to support this, if you really wanted to.

	There are ~250 standard opcodes, and another ~250 SSE opcodes. I have no
	appetite for implementing all these - but it could be fun at some future date.

	Please steal/adapt the code if you're interested.