mizchi/what-i-know-about-moonbit.md

## what-i-know-about-moonbit.md

      
    Raw
  

              what-i-know-about-moonbit.md
            
          
    Translated by ChatGPT https://zenn.dev/mizchi/articles/practical-moonbit (lang:ja)
The previous introductory article was a bit provocative, so this time I'll discuss what it's like to actually research and write about these topics.
https://zenn.dev/mizchi/articles/introduce-moonbit (Japanese)
Author and Development Organization

The development body is IDEA, a research organization in Shenzhen.
https://www.linkedin.com/company/idearesearch?trk=ppro_cprof
Hongbo Zhang, formerly of Meta and developer of BuckleScript | ReScript, is the chief architect.
https://github.com/bobzhang
For those unfamiliar with ReScript, it's a kind of AltJS that layers JS-like syntax on top of BuckleScript, which converts OCaml to JS, and includes JSX syntax for writing React components.
module Button = {
  @react.component
  let make = (~count) => {
    let times = switch count {
    | 1 => "once"
    | 2 => "twice"
    | n => n->Int.toString ++ " times"
    }
    let text = `Click me ${times}`

    <button> {text->React.string} </button>
  }
}
https://rescript-lang.org/
I've considered using it myself in the past, and it turned out to be an excellent language with far more expressive power than JS/TS when writing exclusively in the ReScript world. However, referencing the JS world required typing and importing everything, lacking the flexibility of TS's any, so I gave up (though this may have improved now).
I understand there are no such problems with WebAssembly since it starts from a non-existent ecosystem.
Information Sources and Community

Basically, you can learn a lot by asking on Discord.
https://discord.com/invite/CVFRavvRav
It seems there are many Chinese speakers organizationally and origin-wise, but they prioritize discussions and documentation in English, so there should be no problem communicating in English (I use Deepl for all my conversations).
There are some QA logs on Discuss.
https://discuss.moonbitlang.com/
Blogs introducing the latest features
https://www.moonbitlang.com/weekly-updates
Internal Implementation Guess

The source code of the compiler itself is currently not public (probably will be OSS in the future), but can be guessed.
The compiler body is likely OCaml. The basis is the js_of_ocaml error that appears when running moon run main --target js. Hongbo Zhang's BuckleScript and ReScript were also in OCaml, and Meta's language development division generally used OCaml as a common language. (For example, Flowtype is also implemented in OCaml).
# Code containing inline wat built with --target js
$ moon run main --target js
# Omitted
Fatal error: exception File "lib/xml/js_backend/js_of_mcore.ml", line 1254, characters 33-39: Assertion failed
The CLI for the moon moonc commands is probably written in Rust, based on the occasional Rust unwrap failure errors seen in moon command errors.
The core library is self-hosted in the moonbit language and is OSS.
https://github.com/moonbitlang/core
Primitive types are not included here, but in the changes made in the last week or so, the .mbt extension has been added along with a .mbti extension, declaring the type signatures of the interfaces provided by the compiler.
https://github.com/moonbitlang/core/blob/main/builtin/builtin.mbti
The compiler does not use LLVM, but generates WAT (WebAssembly Text Format) after static analysis, which is then converted to .wasm by wasm-tools.
(Probably one of the Moonbit developers) peter-jerry-ye wrote some code for the peter-jerry-ye/memory library that hardcodes WAT code.
extern "wasm" fn load8_ffi(pos : Int) -> Int =
  #|(func (param $pos i32) (result i32) (i32.load8_u (local.get $pos)))

extern "wasm" fn load32_ffi(pos : Int) -> Int =
  #|(func (param $pos i32) (result i32) (i32.load (local.get $pos)))
https://github.com/peter-jerry-ye/memory/blob/main/memory.mbt
You can generate .wat files with the build option $ moon build --target wasm-gc --output-wat. From what I can see, this extern expands directly into wat code.
The generated .wat is very straightforward, and the compiler probably performs zero-overhead abstraction to keep the generated code small. This is an approach necessary for WebAssembly(-GC) native languages but not possible with other LLVM-based languages, where an extra abstract layer tends to increase the size of the generated code.
As far as I know, the primitive types almost completely correspond to the specifications of WebAssembly. For example, currently, Moonbit does not have syntax like >>, << for bit shifts, but can be substituted with Int::lsr => i32.lsr (logical shift right) or Int::lsl => i32.lsl (logical shift left).
How to Set Up the Development Environment

Just follow the instructions at:
https://www.moonbitlang.com/download/
After installation, upgrades can be made with moon upgrade, although a warning appears that it is still experimental.
After installation, the entire toolset is included under ~/.moon, and importantly, moonbitlang/core is cloned under ~/.moon/lib/core. This is the standard library, and updates to the standard library are made by updating this code and hitting moon bundle --source-dir ~/.moon/lib/core. It seems the vscode LSP also looks here.
I will also mention some current points of difficulty I have recognized. As far as I know, vscode seems unable to resolve paths when there is a moon.mod.json in a subdirectory.
For example, when foo/main/main.mbt cannot resolve foo.mbt.
foo/
  main/
    main.mbt
    moon.pkg.json
  foo.mbt
  moon.pkg.json
  moon.mod.json
moon.mod.json

This can be resolved by opening the foo directory as the root of the workspace in vscode.
This has been quite stressful for me, so I hope it gets fixed soon.
Package Manager and Module System

moon add username/lib to download a module from mooncakes.io. A namespace is probably mandatory.
At first, doing moon publish seems to simply upload the directory, and hitting moon add user/pkg seems to unpackage it under .mooncakes/.
The moon command links at compile time under .mooncakes. Only moonbitlang/core is built-in by default, but can be excluded with --nostd.
Honestly, the completion level is not high. I dislike the idea of developing a proprietary package distribution method, so I at least tried to prepare an official registry. I'm not saying it's bad, but I think it's the right choice at this phase.
Module subdependencies are flatly deployed under .mooncakes/*. Currently, there are no version solvers or Sub Dependency resolution rules like npm or cargo, so it seems likely that dependencies will be overwritten when they clash. Honestly, this behavior is quite dubious.
The biggest problem is that there is no explanation anywhere on how to call code downloaded from mooncakes.io. (I probably think it's unnecessary because there is a consistent module resolution rule officially explained in another document, but I think it's lacking.)
For example, I first tried using PerfectPan/base64 I found first. I understand doing $ moon add PerfectPan/base64, but there was no documentation on how to call it.
https://mooncakes.io/docs/#/PerfectPan/base64/members
(Just to be clear, I'm not saying Mr. PerfectPan's documentation is poor. This is common to almost all Moonbit library documentation.)
The answer is to describe it like this in the client directory's moon.pkg.json, and it will be referable as the namespace @base64.
{
  "is_main": true,
  "import": ["PerfectPan/base64"]
}
fn main {
  let encoded = @base64.base64_encode("hello");
  let decoded = @base64.base64_decode(encoded);
  println("decoded \(decoded)");
}
It's no big deal once you understand it, but it actually took me quite some time to get to this point. moonbitlang/core was specially treated by the compiler, so I wasn't sure if I could refer to it.
An associated problem is that, currently, the VSCode extension sometimes does not work for completion to external module structs. The obscurity of mooncakes module resolution and the general confusion about how I should refer to libraries have been quite confusing.
For example, this code is compilable, but @vec.V does not complete until the code is written, and it remains an error display, which was quite confusing.
fn main {
  let vec = @vec.Vec::[1, 2, 3]
}
(Completion works when written as let vec: @vec.Vec[Int] = )
Another point of confusion is that when a function returns a type other than Unit, it is reported as an error rather than a warning if there is no receiver.
fn main {
  @vec.Vec::[1, 2, 3]
  // Expr Type Mismatch
  //         has type : @vec.Vec[_/0]
  //         wanted   : Unit
  // 
  ()
}
This combined, calling a function belonging to an external module is very difficult. Most function call lines look like errors when taken alone. I'd like it to be a warning, not an error, when there is no receiver.
Also, when a package @username/foo publishes functions like pub struct Foo {}; pub fn Foo::new() -> Foo {...}, calling them from outside the package is @foo.new() instead of @foo.Foo::new(). I can't tell if my interpretation is wrong or if it's a bug.
Using the Standard Library

moonbitlang/core is always included.
fn main {
  // Examples accessible at the top-level namespace included in core
  let list: List[Int] = List::[]
  let array: Array[Int] = Array::[]
  let tuple: (Int, Int) = (1, 2); // Tuple
  let r: Ref[Int] = { val: 1 }

  // Examples that are included but accessed with a namespace
  let hm: @hashmap.HashMap[Int, Int] = @hashmap.HashMap::[]
  let vec: @vec.Vec[Int] = @vec.Vec::[]
  let set: @immutable_set.ImmutableSet[Int] = @immutable_set.ImmutableSet::[]
  let queue: @queue.Queue[Int] = @queue.Queue::[]
  let stack: @stack.Stack[Int] = @stack.Stack::[]
  let rand_int = @random.RandomInt::new(10);
}
A mutable array can be declared as @vec.Vec[T] and pushed to, then converted to an array by doing vec.to_list().to_array(). Is there no direct method to toArray?
Global Variables

Declare Ref[T] type at the top level.
let count: Ref[Int] = { val: 0 }

fn increment() -> Int {
  count.val = count.val + 1
  count.val
}
Iterators

...I was going to write about this, but it seems to be under active development, so I'll add it later.
This sample code did not work at the time on my end. Looking at moonbitlang/core, it doesn't seem to use this interface.
https://twitter.com/moonbitlang/status/1783913471442297026
Honestly, the current loop syntax is a bit cumbersome. The high-order function list.iter(fn (x) {...}) was troublesome when introducing async await in TypeScript, so I want something equivalent to JS's for of at the syntax level.
How to Make a Library

The directory generated by a normal moon new hello looks like this.
hello/
├── README.md
├── lib
│   ├── hello.mbt
│   ├── hello_test.mbt
│   └── moon.pkg.json
├── main
│   ├── main.mbt
│   └── moon.pkg.json
└── moon.mod.json

At first, I thought this structure would be directly published as a library, but it was a bit different when I tried it.
When making a library, the code generated with --lib is easier to understand.
$ moon new mylib --lib
$ tree mylib -I target
mylib
├── README.md
├── lib
│   ├── hello.mbt
│   ├── hello_test.mbt
│   └── moon.pkg.json
├── moon.mod.json
├── moon.pkg.json
└── top.mbt
pub in top.mbt becomes the function exported outside the module.
However, there's no syntax like TypeScript's re-export (export {} from "..."), so if you want to publish the implementation of lib functions as is, you need to write a function that just wraps the same function signature.
Actually, this top.mbt is like this.
pub fn greeting() -> Unit {
  println(@lib.hello())
}
Is this redundant? Looking at packages that are published, many simply implement at the top level.
$ tree ~/repo/peter-jerry-ye/memory/
├── LICENSE
├── README.md
├── memory.mbt
├── moon.mod.json
└── moon.pkg.json

It's simple, so I think this way is fine.
In this way, I was wondering how to place debug use examples, and I figured it out by digging a main/ and then pointing to the parent library name in main/moon.pkg.json.
├── main
│   ├── main.mbt
│   ├── moon.pkg.json
│   └── run_test.ts
├── mod.mbt
├── mod.ts
├── moon.mod.json
├── moon.pkg.json
└── tsconfig.json

{
  "link": {
    "wasm-gc": {
      "exports": ["write_bytes_test"]
    }
  },
  "import": [
    {
      "path": "mizchi/js_io", // Parent mod name
      "alias": "js_io"
    }
  ]
}
I think this alias should be unnecessary, but currently, it doesn't resolve names correctly without it. I'd like to know a good way to resolve this.
Another problem I'm having is that there's no .npmignore equivalent currently, so everything gets uploaded. Publishing this unnecessary main could potentially trigger compilation errors for some reason.
If I could make a request, I'd like moon.pkg.json to be optional. From what I've tried, most cases just involve placing an empty {} file.
Grammar TIPS

Here's a feature I thought was possible: setting trait bounds on type arguments.
fn eq[T: Eq](x: T, y: T) -> Bool {
  x == y
}
Enums are really convenient, and when I implemented a JSON parser, I defined it like this.
pub enum JSONValue {
  String(String)
  Boolean(Bool)
  IntNumber(Int)
  FloatNumber(Double)
  Object(Array[(String, JSONValue)])
  Array(Array[JSONValue])
  Null
} derive(Debug, Eq)
I thought keyword arguments and default values were more flexible and convenient than Rust.
fn Point3d::new(~x: Int, ~y: Int, ~z: Int = 0) -> Point3d {
  Point3d::{ x, y, z }
}
fn main {
  let p1 = Point3d::new(x = 1, y = 2)
  let p2 = Point3d::new(x = 1, y = 2, z = 3)
}
For example, writing bindings for React, this feature could represent function component props.
It's basically similar to Rust, but the biggest difference is that generics are T[X] instead of T<X>. GitHub Copilot thinks it's Rust even for .mbt, so rewriting it often works. Due to the AI's learning mechanism, it's pulled by Rust, which has more learning data.
Currently, it's unfortunate that derive can only be used for built-in traits like Eq, Debug, Show, Hash, etc. Being able to derive your own defined traits is a future feature.
You cannot define your own traits or methods for built-in types.
pub fn Int::to_xxx(self: Int) -> String {}
// Cannot define method to_xxx for foreign type Int
I tried to make a Serde-like deserializer and got stuck here.
Running the main Function

When "is_main": true is defined in <dir>/moon.pkg.json, the main function of that module can be called with moon run <dir>.
fn main {
  println("hello")
}
In moonbit, you usually need to explicitly state the return type of a function, but this is the only time it's exempted.
Running Tests + Result Type

You can describe test cases with test {}. Also, this implicitly recognizes as a Result type block, allowing you to use the ? syntax.
test {
  @assertion.assert_eq(1 , 1)?
}

// with test name
test "1=1" {
  @assertion.assert_eq(1 , 1)?
}
This can be run by executing moon test in a directory with moon.mod.json. However, currently, there is no moon test --filter or similar, so you need to comment out each one if you want to exclude execution.
Functions that return built-in Result types can be used like Rust outside of the test block.
fn test_result(x: Int) -> Result[Int, String] {
  @assertion.assert_eq(1, 1)?
  if x > 5 {
    Err("x is too big".to_string())
  } else {
    Ok(x)
  }
}
Option[T] is also available, and like Rust, it can be unwrapped with ?.
One problem I've had is that runtime errors from .unwrap are not supported by stack traces, and no more information is available once an unwrap error is thrown. It's better not to use it too much.
fn test_result(x: Int) -> Result[Int, String] {
  let x: Result[Int, Int] = Err(1)
  let _ = x.unwrap() // This loses the stack trace here
  Ok(1)
}
FFI from JS

Pass the implementation to import during WebAssembly instantiation.
const { instance: { exports } } = await WebAssembly.instantiateStreaming(
  fetch(new URL("../target/wasm-gc/release/build/main/main.wasm", import.meta.url)),
  {
    xxx: {
      foo: () => 1
    }
  }
);

const ret = exports.run() 
The types of arguments and return values can use Int (and later mentioned externref).
fn xxx_foo() -> Int = "xxx" "foo"

pub fn run() -> Int {
  let v = xxx_foo()
  v + 1
}
When compiling this run function, it must be compiled with moon build --target wasm-gc, specifying the functions to be published as follows.
{
  "link": {
    "wasm-gc": {
      "exports": ["run"]
    }
  }
}
externref

More about WebAssembly's externref than Moonbit, but when a pub fn with --target wasm-gc publishes a function with a return value other than a number, it wraps it in externref.
This is treated as a mere pointer object on the host (JS) side without touching the contents, but it returns to the original reference when returned to the Moonbit side. This is quite convenient.
struct Point {
  x: Int
  y: Int
}
pub fn point(x: Int, y: Int) -> Point {
  Point::{ x, y }
}
pub fn get_x(point: Point) -> Int {
  point.x
}
Calling from the JS side
const p = exports.point(3,4);
// The host can't touch the contents of this p, but it can be returned to the wasm side
exports.get_x(p) //=> 3
js_string

From what I heard on discord, it seems not to be a stable API, but when --taget wasm-gc, the output of moonbit's println can be passed to the JS side. Here's an implementation example on the JS side.
  let memory;

  const [log, flush] = (() => {
    let buffer = [];
    function flush() {
      if (buffer.length > 0) {
        console.log(new TextDecoder("utf-16").decode(new Uint16Array(buffer).valueOf()));
        buffer = [];
      }
    }
    function log(ch) {
      if (ch == '\n'.charCodeAt(0)) { flush(); }
      else if (ch == '\r'.charCodeAt(0)) { /* noop */ }
      else { buffer.push(ch); }
    }
    return [log, flush]
  })();

  const importObject = {
    spectest: {
      print_char: log
    },
    js_string: {
      new: (offset, length) => {
        const bytes = new Uint16Array(memory.buffer, offset, length);
        const string = new TextDecoder("utf-16").decode(bytes);
        return string
      },
      empty: () => { return "" },
      log: (string) => { console.log(string) },
      append: (s1, s2) => { return (s1 + s2) },
    }
  };

  WebAssembly.instantiateStreaming(fetch("/target/wasm-gc/release/build/main/main.wasm"), importObject).then(
    (obj) => {
      memory = obj.instance.exports["moonbit.memory"];
      obj.instance.exports._start();
      flush();
    }
  )
https://github.com/moonbitlang/moonbit-docs/blob/main/examples/wasm-gc/index.html
Reading the behavior of js_string with --output-wat, println writes string data to moonbit.memory, and js_string.new passes the string's offset and length to the JS side, which then decodes and displays it.
(Personally, I think textEncoder.encode generates a Uint8Array, so I understand that JS is aligning its internal representation with UTF-16, but maybe Moonbit's binary representation should use UTF-8, which is easier to decode in JS, instead of UTF-16.)
WebAssembly.Memory for Sharing Memory

In the js_string example, this moonbit.memory is used as a shared buffer.
https://github.com/peter-jerry-ye/memory As a reference, the code to write data to any memory location is written like this.
Writing the data length in the first 4 bytes of the offset, the rest is binary.
extern "wasm" fn load8_ffi(pos : Int) -> Int =
  #|(func (param $pos i32) (result i32) (i32.load8_u (local.get $pos)))

extern "wasm" fn store8_ffi(pos : Int, value : Int) =
  #|(func (param $pos i32) (param $value i32) (i32.store8 (local.get $pos) (local.get $value)))

// WebAssembly.Memory is 64kb per page,
// use the latter half to avoid collisions with js_string
let buffer_offset : Ref[Int] = { val: 32768 }

pub fn write_buffer(bytes : Bytes) -> Unit {
  store_bytes(bytes, buffer_offset.val)
}

pub fn read_buffer() -> Bytes {
  let offset = buffer_offset.val
  let len = read_length(offset)
  let bytes = Bytes::make(len, 0)
  for i = 0; i < len; i = i + 1 {
    bytes[i] = load8_ffi(offset + 4 + i)
  }
  bytes
}


fn store_bytes(bytes : Bytes, offset : Int) -> Unit {
  let byte_length = bytes.length()
  let int_bytes = int_to_bytes(byte_length)
  for i = 0; i < 4; i = i + 1 {
    store8_ffi(offset + i, int_bytes[i])
  }
  for i = 0; i < byte_length; i = i + 1 {
    store8_ffi(offset + 4 + i, bytes[i])
  }
}

// read byte length
fn read_length(offset : Int) -> Int {
  load8_ffi(offset) + load8_ffi(offset + 1).lsl(8) + load8_ffi(offset + 2).lsl(
    16,
  ) + load8_ffi(offset + 3).lsl(24)
}

fn int_to_bytes(value : Int) -> Array[Int] {
  [
    value.land(0xFF),
    value.lsr(8).land(0xFF),
    value.lsr(16).land(0xFF),
    value.lsr(24).land(0xFF),
  ]
}
JS side
let _memory: WebAssembly.Memory;
let _offset = 32768;

export function setMemory(newMemory: WebAssembly.Memory) {
  _memory = newMemory;
}

export function writeBuffer(bytes: Uint8Array) {
  const buf = new Uint8Array(_memory.buffer);
  const intBytes = intToBytes(bytes.byteLength);
  buf.set(intBytes, _offset);
  buf.set(bytes, intBytes.byteLength + _offset);
}

export function readBuffer(): Uint8Array {
  const buf = new Uint8Array(_memory.buffer);
  the len = getLength(buf, _offset);
  return buf.slice(_offset + 4, _offset + 4 + len);
}

function getLength(buffer: Uint8Array, offset: number): number {
  return new DataView(buffer.buffer, offset, 4).getInt32(0, true);
}

function intToBytes(value: number): Uint8Array {
  const buffer = new ArrayBuffer(4);
  the dataView = new DataView(buffer);
  dataView.setInt32(0, value | 0, true);
  return new Uint8Array(buffer);
}
...and I've published this library as https://mooncakes.io/docs/#/mizchi/js_io/.
https://github.com/peter-jerry-ye/memory is more properly implemented with a variable-length memory allocator.
Moonbit Features Missing: Asynchrony

Specifically, async/await syntax.
I understand that implementing this is difficult. Especially in Rust, there was a fuss with tokio, and there is still confusion between std/async.
But, for Moonbit to run on Cloudflare Workers as it aims to, Moonbit itself needs to handle asynchrony, specifically needing to hit fetch.
Imagine this
fn js_fetch(req: Request) -> Response = "js" "fetch"

pub fn handle(req: Request) -> Response {
  let res = await fetch(req); // or rust style fetch(req).await
  let json = await res.json()
  // ... something json decoder
  Response::json(returning_json)
}
JS's async/await is actually syntax sugar for generators, so considering that the host controls the event loop, what's really needed might be Generators.
Moonbit Features Missing: Component Model

If Moonbit is to be practical for complex applications, it will need interfaces other than numbers and externref. In other words, a spec for communicating some kind of structure, like Rust's wit-bindgen, is necessary.
So, assuming it is combined with mizchi/js_io, I wrote a simple JSON binary encoder.
https://mooncakes.io/docs/#/mizchi/protocol/members
This is still experimental. I want to align the internal structure with some spec of ABI. I'm currently reading the spec for WebAssembly's Canonical ABI.
https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md
So, I wrote a Proposal.
moonbitlang/moonbit-RFCs#5
However, the current component model does not support Wasm GC, and that's the answer I've received.
Also, I wrote a JSON encoder before, but just last week, json was added to moonbitlang/core. So, I could extend it to speak JSON, albeit inefficiently.
https://mooncakes.io/docs/#/mizchi/json/members
https://github.com/moonbitlang/core/tree/main/json
Practicality of Moonbit at This Point

Although third-party libraries are not well-established, and there are issues with submodule resolution methods, Moonbit is a viable option for writing synchronous functions without external module dependencies.
Rather, I think there are no other options that make it as easy to write WebAssembly at Moonbit's level.
Just need to somehow become popular and wait for the libraries to fill up. I think I can already use it because I can write what I want myself, but I want official support for asynchrony and the Component Model.
That's all.