By Limin Fu, 2014.10.22
The starting point would be the Dao programming language and its implementation. And the ending point would be new and statically compiled system language that supports much of the current syntaxs and features of Dao. With the frontend based on Dao, and the backend based on LLVM, it should be relative easy to implement a compiler for such language.
Please note that, this by no means makes the current Dao obsolete, or less important. Dao is for application programming, which is the primary application domain, which will not change or become less important. Also the new language should have excellent interpolation with Dao, as well as C. So if everything works out as expected, we will have two languages with similar syntaxes and features, but at different levels, one at lower level for system programming, and the other at a high level for application programming. (Merging them would not be possible because of different techinical requirements.)
For convenience, I will name this new language NeoDao for now. Basing it on Dao means a large portion of the Dao implementation could be reused, in particular, the parser, the type system, the bytecode format and the virtual machine. It seems quite natural that each source file of NeoDao should be compiled into a bytecode file, which could then be further compiled into bitcode files and linked into native executable files by a backend based on LLVM. This way, it should be able to build large projects very efficiently.
Of course, such bytecode should also be executable on the virtual machine. So the first step would be expanding Dao with more primitive data types for NeoDao. It will also be necessary to remove some of the built-in types and move their implementation from C to NeoDao. All these must be done in self sufficient way such that the compiler and VM should be able to compile and execute NeoDao code without ever generating native code.
As a system language, NeoDao will need to support manual memory management. It will use ownership to greatly simplify memory deallocations. It may not be able to handle all cyclically referenced objects. This means manual intervention will be necessary, either by breaking such references manually, or deallocating them individually.
But an optional garbage collection could be supported as a library. A task local garbage collector can be created, which can obtain object references to manage those objects.
All memory errors such as dereferencing null pointer, double-free, freeing objects still in use, leaking must all be caught a runtime. And the type system could eliminate them as many as possible. It is not necessary to enhance the type system to do such things, but such abilities can be added later without any impact on the language and the backend.
The concurrency in NeoDao should support both shared memory model and message passing through channels, more or less as it is now.
###############################
class K
{
var m1 : list<int> # object
var & m2 : list<int> # reference, shared;
}
Bitwise copying
Owned variables are automatically deallocated when they become out of scope.
"shared" variables are typed as "type&", handled in a similar way as "var" and "invar" types. Values of variables with local scope (lifetime) cannot be assigned to a shared field of a variable with lifetime exceeding those variables.
"shared" variables are referenced counted. They are only
deallocated by its owner, freeing by others will only reduce
their reference count. When a shared variable is deallocated
by its owner, it will check its referenced, if it is not zero,
a memory leaking warning can be issued.
All non-primitive values have an additional reference counting
field that precedes the value. All values that are moved to a
variable of inexact types such as "any" and variant types, will
be boxed with type field preceding the refcount field (which is
added for primitive values).
Primitive variables cannot be shared, as they do not support reference couting.
var object = K() # An object, not a reference;
var & ref = object # A reference;
In C language:
struct Object
{
...
};
struct RefData
{
Object *object;
Type *type; // statically created;
void *owner; // owner address;
int refcount; // must be atomic;
};
enum RefType
{
ORIGINAL, // the one in ObjectBox
NORMAL, // copy of an original reference
TRANSIENT, // temporary reference for embedded object
TRANSIENT_LOC, // copy of a local transient reference
TRANSIENT_EXT // copy of an external transient reference
};
struct Reference
{
RefType reftype;
RefData *refdata;
};
struct ObjectHeader
{
Reference ref;
RefData refdata;
};
// Individually allocated object will allocate the following struct
// to include extra data. When the reference of the object will use
// the preallocated and intialized "ref" as the reference object.
// For transient references, a "ObjectHeader" will be created (on stack?),
// and its "ref" field will be used.
//
// When a reference is assigned to a variable, it will be copied
// with refcount in RefData increased.
// If the variable is on stack, reference can also be created on stack.
//
// refcount of a transient reference can be checked for illegal use
// of transient reference (hence an illegal use of the embedded object).
//
struct ObjectBox
{
Reference ref;
RefData refdata;
Object object;
};
owner
will be NULL
for locally owned objects.
In other case, it will be the address of its container.
class Klass
{
var value = 123
}
class Klass2
{
var & value : Klass;
}
routine Test( & param: Klass2 )
{
var stackObject = Klass()
# Allocation on stack;
# Local variable stackObject has the ownership;
#
# DVM_CALL : ...; mode = INIT_STACK;
var & heapObject = Klass()
# Allocation on heap;
# Local reference variable has the ownership;
#
# DVM_CALL : ...; mode = INIT_HEAP;
# DVM_MOVE : ...; mode = OBJ_TO_REF;
var stackObject2 = heapObject;
# Bitwise copying;
#
# DVM_MOVE : ...; mode = REF_TO_OBJ;
param.value = stackObject;
# Valid, assigning an object to a reference type,
# means automatic boxing the reference;
# But the object and reference is local;
# param.value should be reset before exiting the scope;
# Otherwise a running time error will be issued;
#
# DVM_MOVE : ...; mode = OBJ_TO_REF;
param.value = heapObject;
# Valid.
# The reference is local, but the object is not;
#
# DVM_MOVE : ...; mode = REF_TO_REF;
param.value = Klass();
# It should also allocate on heap,
# because "param.value" expects a non stack object reference;
#
# DVM_CALL : ...; mode = INIT_HEAP;
# DVM_MOVE : ...; mode = OBJ_TO_REF;
# When exiting the scope:
# Destructors of stack objects will be called;
# Local reference will have reference count decreased;
#
# If an object pointer indicates that the object was
# allocated on stack, and the reference cound did not
# reduce to zero, and running time error will be issued.
}
After an object is created, it will be automatically owned
by any explicit variable which comes to hold a reference of
the object. To change the ownership, use :=
assignment.
When a new reference is assigned to a variable with an existing reference, the existing reference will be unreferred, with ownership checked and object deleted if the object is owned by it and has zero refcount.
A transient reference is a reference referring to a object that is embedded in another object (or is allocated on stack?). Such reference is only for temporary local use. Ordinary reference is used for individually allocated objects. The use of transient reference is no different from ordinary references, and programmers can be completely oblivious to the differences.
Individually allocated objects could allocate additional space for reference handling, which makes catching memory errors possible when deallocating them. But this is not possible with objects that are embedded in other objects. Transient reference is employed to ensure local and temporary use of such objects as references, which will ensure that such objects will not be referenced anywhere when they are deallocated along with their host objects.
class One
{
...
}
class Two
{
var value = 123
var object = One()
}
routine Test( & one: One )
{
# Do something with one;
# The compiler will add code to handle all the variables
# that will become out-of-scope.
# The code will check transient references to ensure that
# such references are not holded by any other variables
# that may survive beyond the current scope.
}
var two = Two()
Test( two.object ) # Transient reference;
routine Test2()
{
var two = Two()
var & one = two.object # Transient reference;
Test( one )
}
class Node
{
var value = 123
var next : Node|none = none
routine delete(){
next = none
}
}
if (1) {
var node = Node() # Allocation on stack;
node.next = node
}
# Ther parser will generate DVM_REMOVE for variables that
# have become out-of-scope:
# DVM_REMOVE: <node>, 0, 0;
# The inferencer will expand DVM_REMOVE into:
# DVM_FINALIZE: <node>, 0, 0; Invoke destructor;
# DVM_DEALLOCA: ...;
if (1) {
var & node = Node() # Allocation on heap;
node.next = node
}
# DVM_UNREF: <node>, 0, 0; Un-reference;
# DVM_OWNER:
# DVM_TEST: ownership
# DVM_FINALIZE: <node>, 0, 0; Invoke destructor;
# DVM_TEST: refcount
# DVM_DELETE:
# When UNREF is executed, the ownership of <node>
# will be checked. If it is locally owned:
# 1. memory of <node> will be freed, if its refcount has become zero;
# 2. raise an memeory leaking error, otherwise;
var a: int[3] = { 1, 2, 3 } # built-in list type;
var a = { 1, 2, 3 } # int[3];
class Array<@T>
{
routine Array<@T>( invar a: @T[] ); # Invokable for initilizer list;
}
var array: Array<float> = { 1.0F, 2, 3 }
var t = [ 1, "abc" ] # tuple?
var t: [int,string]
class Node
{
var value = 123
var next: Node|none = none
}
routine Func()
{
var & node = Node()
node.next = node
node.next = none
# Breaking the cycle;
# Or:
delete node
}
Opercode and operand size should become unsigned int
.
typedef unsigned int opcode_t;
typedef unsigned int operand_t;
struct DaoVmCode
{
opcode_t code;
operand_t a, b, c;
};
DVM_ALLOCA : size_high_bits, size_low_bits, handle;
DVM_ALLOC : size_high_bits, size_low_bits, handle;
DVM_DELETE : handle, 0, 0;
Automatic return will be disabled.
if possible, implement this new language "NeoDao" purely in "Haxe" language, not in C/C++. (see
http://en.wikipedia.org/wiki/Haxe
http://www.haxe.org
http://haxe.org/use-cases
http://haxe.org/use-cases/who-uses-haxe.html
http://www.haxe.org/manual
https://groups.google.com/forum/#!forum/haxelang
)
It has very active community.
Advantages: anysource code written in haxe can be source to source translated in targets like Java,C#,C++,Javascript,Python,PHP,NekoVM etc. Some external parties are attempting haxe to ruby and haxe to objective c and haxe to lua convertor etc too. Very importantly, Haxe to pure C convertor is in works, which is very important for SYSTEMS PROGRAMMING LANGUAGE, this C can be easily be compiled using llvm backend of C. The externs from all these languages (for external libraries) can be written very easily.
So, your source code of NeoDao written in pure Haxe will be translated to pure C++ or pure Python or Pure Java etc. So your NeoDao language and runtime will run purely on all these targets.
I suppose there are lots of language written for Systems programming language(like Go,C etc). Its important to implement a language which solves a new paradigm, which was not solved earlier. Then only it will catch eyes of community big time.
Please seriously consider implementing Dao /NeoDao in Haxe. You may later appreciate - how much it benifitted this project.
Apart from your systems programming language vision, it will also fill the gap of not having a good "scripting language gap" for haxe platform, as "hscript" does not fill that gap. If you comeup with NeoDao implemented in haxe(for your own systems programming need), it will be immediately taken up by Game programmer and System Administrators - who are very much using haxe right now, with no mature scripting language available for haxe (as haxe is good as compiled language).
**Please solve a gap where the language gap exists, and your programming language will become very really popular. It will then have a real answer to the often newbie question "but why should i use Dao?" It will become a systems language too and unified scripting language available for anybody who is on any platform or (very importantly) on ANY VM. **
Take my words -
Stage1: you IMPLEMENT NeoDao in Haxe
Stage2: hence GET NeoDao running on any popular VM (including jvm, .net, pythonvm, .. )(and also as systems programming language running as C/C++)
Stage3: and SO you will have a real POPULAR Ultra-portable(portable across VMs) scripting+systems language in waiting. We are waiting for a messiah to solve this problem of super-ubiquitious scripting+systems language