rahulmutt/GHCVM - Proposal.md

## GHCVM - Proposal.md

      
    Raw
  

              GHCVM - Proposal.md
            
          
Table of Contents  generated with DocToc

The Motivation & Design for GHCVM

Motivation
Design

The Core Idea
Objectives

Deliverable for Midterm
Final Deliverable


Current Progress
Compilation & Execution Overview


The Runtime System

STG Execution Model
RTS Data Types


Closures
STG Context
Stack Frames & Context
Tail Calls
Code Generation

Overview
Look & Feel of Generated Classes


Implementing the Java FFI
Implementing base


Conclusion

Footnotes


The Motivation & Design for GHCVM

Motivation

There are a number of other issues to solve in Haskell such as inavailability of in-depth learning materials all in one place and lack of proper tutorials on building common applications using standard techniques and libraries, but I find the weak platform to be the largest bottleneck in Haskell's growth.
The largest IT companies in the world like Amazon, Google, and Oracle, base a majority of their business around the JVM and they're doing really well. The reasons why these enterprises choose the JVM over other platforms are numerous, but here are the three most important:

The Java platform comes with stable libraries in virtually every domain.
The JVM is battle-tested, cross-platform, fast, and actively researched.
Java developers (those who can work with the JVM) can be found anywhere.

But at the same time, the primary language Java, which the JVM was initially developed for, is cumbersome to work with, hinders rapid application development, and requires an IDE with advanced features in order to be productive. Can we make the JVM (the platform) more pleasant to work on without having to endure the crippling limitations of Java (the language)?
Yes we can, indeed! Scala and Clojure are starting to pierce into the JVM community and they only look to be gaining more popularity in the times to come, heralding the age of functional programming. But it is not all roses. Having experienced refactoring a large codebase in Clojure and finding out that it was equivalent to walking on landmines [1], I felt the pain point that Haskell can solve with its advanced type system - helping programmers and their employers sleeping well at night feeling secured by their types [2].
So with such great benefits from typed functional programming, why aren't more companies using it? Because Haskell (as a platform) is weak in all the points mentioned above, while the JVM (as a platform) is strong. And what runs a business is a reliable platform, not the language. Every programming language that has seen success till date has had an amazing platform - Perl with CPAN, PHP with CMSs, and Ruby with Rails to name a few. Is there a way we can combine Haskell (as a language) and the JVM (as a platform)?
Yes! And that brings us to the core proposal: how combine the best of two amazing technologies by hooking up GHC with a JVM backend.
Design

NOTE: From now on, when I refer to Haskell, I mean GHC's Haskell. In particular, I will be working with GHC 7.10.3.
The Core Idea

The main goal of this design is to maintain GHC's semantics as much as possible so that we can maintain maximum compatibility with existing pure Haskell packages. The idea is that Haskell has wonderful libraries at a high level of abstraction and the JVM has great "plumbing" libraries for just about any task in the world, so any design decisions will aid in ensuring that those abstractions remain the same and that the plumbing is simple to wire in to the abstractions.
In order to match GHC semantics, I literally work side-by-side with GHC's source code [4] that deals with the conversion from STG code to Cmm code as well as the RTS very closely ensuring that I don't miss a detail when porting the equivalent functionality to the JVM.
Objectives

While aiming to solve the full problem would be difficult in the 3 months, I have come up with objectives that seem reasonable to achieve within that time frame.
In order to take a stab these goals, the following must be implemented:


GHC's single-threaded RTS [1 month]


Code generator that converts STG code to JVM bytecodes [1 month]


i. The Java FFI will also be worked on, but will not be discussed elsewhere so mentioned here.

base package [1 month]

Though they are listed independently, they will be worked on concurrently so that one can gradually move up from compiling simple programs to more complicated programs.
Deliverable for Midterm

A ghcvm executable that can compile some of the basic programs from Learn You a Haskell.
Final Deliverable

A ghcvm executable that can compile any program that only depends on base.
Current Progress

The deadlines may seem a bit short, but I have studied the GHC RTS/code generator quite well over the past couple months to the extent where I can weave through the code very fast and know what a block of code does just from looking at the function names. Moreover, I don't have to deal with memory layouts, stack & heap management, and the various profiling methods (cost-centre, LDV, Ticky-Ticky) since the JVM already does it beautifully. Hence, my main focus will be on preserving the semantics of the STG execution model in GHC when I port it over to JVM bytecode.
The work on GHCVM has already started and can be found here.

The hs-java library, a library that aids in bytecode generation, was integrated into the repo and refactored heavily.

i. Includes support for inner class generation.
ii. Includes support for field generation with automatic generation of static and instance initialization code.


The code generator has been setup in GHCVM.CodeGen.Monad with the core code generation occuring in GHCVM.CodeGen.Main.
i. Code generation for the simple case shown below (for any type a) has been finished but not tested yet.


f :: a
f = g

The very basic RTS with support for core types has been setup.

i. The build system has been setup with Shake and a step was included to process C preprocessors for Java code for convenience.
ii. Closures, TSOs, Capabilities, Tasks, and Stack Frames are implemented.
iii. Basic scheduling algorithm from GHC is ported, but incomplete.

The hook to the GHC API has been written that overrides the pipeline to specialize for GHCVM's compilation process and transfers STG code to the codegen.

So essentially, the basic infrastructure is in place to do the core part of this work.
Now I will go over each of the objectives in detail, covering how I will port over data types from GHC's RTS and codegen.
Compilation & Execution Overview

The journey from Haskell source files to execution on the JVM is as follows:


Parse the input Haskell module and obtain STG abstract syntax tree from the GHC API.


Run the code generator on the STG AST to generate JVM bytecode, and save it in a jar file with all the generated classes inside [6].


If ghcvm --make is used, combine all jars of the dependencies and rts.jar into one uberjar (say haskellmain.jar).


Run the program with java -jar haskellmain.jar.


The Runtime System

STG Execution Model

The runtime system of haskell uses an execution model known as the Spineless, Tagless G-machine, a famous abstract machine developed by Simon Peyton Jones and the model which forms the basis of GHC's runtime system design.
In short, STG is a reduced version of Haskell that was created to make things very simple to compile to machine architectures. It consists of primitive let statements that perform heap allocations, case statements that perform evaluations, and primitive operations, among others. The full syntax of STG can be found in compiler/stgSyn/StgSyn.hs.
RTS Data Types

The core data types in the RTS are Capability, TSO (Thread State Object), and Task.
A TSO is a lightweight object that are the basis of the implementation of Haskell threads. The TSOs are stored in a run queue for a Capability and can be stopped and started.
A Capability is a haskell execution context. It stores all information required to run Haskell programs and all the information about interactions of TSOs (like blocking information, etc). A Capability is the one that schedules threads to run in FIFO order from the run queue with logic to skip thread execution or reorder the queue based on the changing STG environment.
A Task is an abstraction of an OS-level thread. In the JVM, this can be simply mapped to a java.lang.Thread. A capability can be attached to different tasks at different points in time. The transfer of capabilities plays an important role in implementing safe foreign function calls.
All these data types are implemented with the same fields as in the GHC's RTS. The difference is that all the GC and profiling related data has been removed.
Closures

The basic unit of the Haskell heap is a closure, which can represent a function, thunk, or data constructor. A closure consists of some metadata (called the info table in GHC), some payload (values of free variables), and entry code. As GHC's info table is primarily used to identify the closure and store data specific to a closure type as well as data useful for GC, we don't require an info table in the JVM implementation. Instead, we can use the natural sub-classing mechanism to specify closure-specific information and use the method overriding feature for closure-specific entry code.
The natural Java mapping is shown below:
public abstract class StgClosure {
    public abstract void enter(StgContext context);
}

StgContext will be explained in the next section

This is the base for a lot of the data types in the RTS, including stack frames. [5]
The following files in GHC
includes/rts/storage/Closures.h
includes/rts/storage/ClosureTypes.h
rts/Apply.cmm
rts/Exception.cmm
rts/PrimOps.cmm
rts/StgMiscClosures.cmm
rts/StgStartup.cmm
rts/StgStdThunks.cmm

contain definitions of standard closures that are required for code generation. Most of those will be translated as sub-classes of StgClosure either manually or with a Haskell script (since many of them are canned) that will run as a step in the RTS build. The pure RTS functions that are not entry points to a closure will either be mapped to a static function call or a sub-class of StgClosure (most likely the latter for uniformity).
STG Context

In GHC's RTS, a Capability has a unique StgRegTable which is a table of values that should ideally map to actual CPU registers when possible. Again, this is not possible on the JVM, so we pass around a context object that serves the same purpose as an environment of registers.
public class StgContext {
    public StgClosure R2;
    public int I1;
    public float F1;
    public double D1;
    public char C1;
    public byte B1;
    public short S1;
    public String Str1;
    public StgTSO currentTSO;
    public Capability myCapability;
    public ReturnCode ret;
}
[7][8]
A "register" is defined for each primitive type in the JVM and arguments to a closure must be passed in the way the closure's entry code expects. The code generator knows about the caller and the callee so it will take care of generating the code for the calling convention.
Stack Frames & Context

GHC works by passing parameters through CPU registers and the stack (in case of spilling). No such control is given on the JVM, but a method call results in a stack frame allocation with the local variables (including the this object and the arguments to the method) and operand stack allocated in the frame. Operations on the operand stack take place in registers. Moreover, it would be desirable to see a Java call stack with the sequence of stack frames that were allocated by the STG machine.
This requirement motivates the following class definition:
public abstract class StackFrame extends StgClosure {
    public StackFrame prev;
    public StackFrame next;

    @Override
    public void enter() {
        while (next != null) {
            next.enter(context);
        }
    }
}
So suppose we have frames (in bottom to top order) as frame1, frame2, frame3. They will be created such that they have links to the frames above and below them. When suspending a thread, we just maintain a reference to frame1 and the context, and we call frame1.next(context) to restore the stack frames as they were before suspension. Because of this, all sub-classes of StackFrame should call super.enter(context) to keep this functionality.
Tail Calls

The most common operations in the STG execution model are entering closures and a lot of the core functions are implemented through tail calls. Tail calls are not supported on the JVM (yet), so I will resort to using fast exceptions which is the closest we can get to goto on the JVM. When the stack actually overflows or we limit the stack size, the topmost stack frame can have an exception handler that catches the exception and re-enters the closure who's entry failed, sending in the last context.
Other methods I evaluated:


Giant method bodies The implementation is complex but if a majority of tail calls happen to travel within the function, this method is the fastest of all. The huge downside is that because the code is compartmentalized across giant methods, the JVM can't really optimize very well.


Mini-interpeter The entire program is run inside of a long running while loop where each closure entry returns the next closure to enter. Not so clean and kills the opportunity of seeing the stack trace on the Java stack, but maybe that be implemented by cascaded mini-interpreters for each stack frame?


The coolest part of this method is that 'returning to the top of the stack' is implemented just by returning from the closure's enter() function! We generate the enter() function code so that the last instruction will always be another enter(). Because enter returns nothing, hopefully the JIT-compiled closure entries won't have too much overhead in returning since nothing is being returned.
There are many more details about the RTS, but that covers the most important aspects.
Code Generation

Now that we know about the RTS, we must generate code that simulates the working of the STG machine.
Overview

The plan of implementation for the code generator is straightforward. Observe the STG->Cmm in GHC and follow the same pattern of code generation, mapping all the basic concepts to the corresponding JVM implementation as detailed in the RTS section and ignoring all information about LDV, CCS, ticky, memory layouts, byte/word calculations etc.
The only reason that the code generator can get done within the projected time frame is that there's a working, battle-tested template - GHC's codegen. My main work is carefully adapting the code generation for the JVM.
Look & Feel of Generated Classes

Suppose we have a package mypackage and the module below:
module MyCompany.MyModule where

data Type0 = Type1 | Type2 Int

myfunction = ...
The generated class corresponding to this function will look like [9]
package ghcvm.mypackage.mycompany;

public class MyModule {
    public static StgClosure myfunction = new MyModule.Myfunction();
    
    public static class Myfunction extends StgClosure {
        ..generated closure-specific layout here...

        @Override
        public void enter(StgContext context) {
            ..generated entry code here..
        }
    }
    
    public static abstract class Type0 extends StgObject {
        public void abstract enter(StgContext context);
    }
    
    public static final class Type1 extends Type0 {
        ..fields of the constructor here..
        
        @Override
        public void enter(StgContext context) {
            ..generated entry code here..
        }
        
        @Override 
        public byte getTag() {
            return 0;
        }
    }

    public static final class Type2 extends Type0 {
        ..fields of the constructor here..
        
        @Override
        public void enter(StgContext context) {
            ..generated entry code here..
        }

        @Override 
        public byte getTag() {
            return 1;
        }
    }
}
Lambda-lifted closures with nonsense names will be declared as private, while dictionaries that are generated for typeclass implementations should stay public so that they can be accessed from the outside.
It is not necessary that a new inner class be defined for every function, due to some of the many special cases in the code generator. This example was just to get the feel of how the code will look when looking at a stacktrace.
Implementing the Java FFI

See here.
Implementing base

This part is relatively easy compared to the others, because if all the primitive types and primitive operations are implemented with the same semantics as GHC, there will be little to no changes in the parts of base that don't depend on the C FFI. Implementation of the IO part of base will be the primary challenge.
Conclusion

Thanks for reading!
Footnotes

[1] Especially one's server code has to handle millions of requests/day and a software failure greatly lowers the value of the business overall.
[2] Which are of course proofs in a pure functional language due the Curry-Howard Isomorphism.
[3] This was especially apparent when reading GHC's source. One will find many cases where optimizations/features have to be disabled because of limitations on Windows.
[4] In particular, compiler/codeGen in the GHC source tree which handles the conversion from STG to Cmm.
[5] An abstract class was chosen over an interface for efficiency since class inheritance leaves inherited methods at constant offsets in the class's vtable in the JVM. The small difference in performance matters since the enter() method is the most frequently called method in the STG machine.
[6] This step may be broken into two if I find that using an high-level abstract intermediate representation will make the code simpler when generating the final JVM bytecode. This may end up happening since a majority of the JVM instructions can be put into groups that do the same thing, just with different types. It'll probably end up looking like Cmm without the register-specific stuff.
[7] The actual definition of StgContext in the implementation will end up using about 6 registers per primitive type and and overflow stack as a Java array type.
[8] In GHC, the calls to functions (including entry points to closures) follow the Native Node calling convention in which a specialized register designated with the label R1 (called Node) stores the closure data associated with an entry point. When call to a compiled function closure's entry point is made, the function's closure object will be passed into R1 and then the closure will be entered. It is natural to observe that R1 is simply this in the context of Java, where Java objects couple functions and data together. So storing R1 is pointless, and the naming above is to highlight for people with GHC experience that R1 is there, but invisible, and to stay similar with GHC where R2 and beyond typically store the arguments to a functional call.
[9] Direct bytecode will be generated. The Java decompilation of the generated code is shown.