Created
June 27, 2011 02:12
-
-
Save cotto/1048210 to your computer and use it in GitHub Desktop.
M0 overlay thoughts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Probably the biggest question is what language we want to use. We'll be | |
spending a huge amount of time writing whatever it is. We need some criteria: | |
* how easy is the language to learn for people accustomed to C | |
* does the language has an object system? We'll be implementing cmop (6model) | |
in it, so the object system needs to be either self-hosting or non-existent | |
* how efficiently does the language map to M0? We want to generate efficient | |
M0 and to have a clear idea of what the M0 for a given snippet looks like | |
* it shouldn't allow things that don't make sense in M0 (not sure what this means) | |
* the language should allow CPS stuff, either directly or indirectly. | |
Indirectly is probably easier. | |
* need distinction between compile-time and runtime constructs | |
* need typed variables | |
* something struct-like | |
* a way to define M1 ops composed from M0 ops | |
* light syntax, easy to implement, no optimizations | |
* if we want macros, they need to be way smaller than C's | |
* function-like casts will make the language much easier to parse visually | |
* int* x, y; means two int* | |
* easy to parse | |
* question: INSP only or something more fine-grained? | |
* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We need some kind of overlay language for M0 that we can use to reimplement | |
Parrot. Writing poke_caller was hard and writing a factorial program will be | |
harder. Anything less trivial will require an actual compiler or code | |
generator. We have the following options: | |
1 (PIR/M0) emit M0 from PIR | |
2 (nqp/M0) emit M0 from nqp | |
3 (winxed/M0) emit M0 from winxed | |
4 (new-nqp/M0) write a new compiler targeting M0 using nqp | |
5 (new-custom/M0) write a new compiler targeting M0 using a separate toolchain | |
6 (steal/retarget) take someone's existing compiler, retarget it to M0 | |
PIR/M0 has the advantage that we'll need to do something similar later anyway. | |
Being able to translate from PIR to M0 will be necessary if we want to continue | |
to support PIR, and we do. I'm not sure if we'll want to replace Parrot's | |
current C code with PIR, though. This approach is worth considering. | |
I don't like the idea of nqp/M0. nqp is already quite slow and I don't see it | |
being feasible to get a speed improvement by using it more internally. It | |
might be the case that we don't need to generate inefficient code to deal with | |
lvalue semantics if the translation is well-designed. There's also the concern | |
that using nqp almost universally means using a bunch of pir:: garbage, which | |
would make M0 translation less efficient. Overall it's a fairly nice language, | |
but I'm not certain that nqp/M0 is the best way forward. | |
winxed/M0 is a nice option. The compiler already exists and has an alternate | |
version (winxedxx) that targets C++. Unfortunately winxed isn't designed to | |
support multiple codegen backends, so we'd have to either refactor codegen into | |
a separate step (probably slowing down performance) or just fork it and write a | |
new backend. The language itself is quite pleasant, but the compiler needs | |
work. I'm still not convinced that this is a bad way forward. | |
new-nqp/M0 brings with it the speed issues of nqp. nqp is very much designed | |
to support a highly flexible compilation workflow, so using it to generate M0 | |
is a reasonable approach. I'm not a fan of the langauge's speed and quirks, | |
though. This approach could be made to work but it doesn't sound like the best | |
approach. | |
new-custom/M0 is almost included for the sake of completeness. I'm tired of | |
writing meta-things and want to get some real work done. Writing a new | |
compiler from scratch is decidedly non-lazy. | |
steal/retarget is a generalization of the winxed/M0 approach. Instead of | |
retargeting winxed, we'd take an unrelated compiler for some language (js comes | |
to mind) and target that at M0 using CPS for control flow. This has the | |
advantage that we're not writing (and debugging) a whole new compiler from | |
scratch, but it depends on us finding an appropriately-licensed compiler for a | |
suitable language and constructed in a modular fashion. | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
possible syntax for mole ("M0 Overlay LanguagE") | |
******* | |
*types* | |
******* | |
I propose that we have 5 types; INSP for registers and cs, which is a C-like | |
string. (This probably won't be exactly like a C-string, but close enough that | |
C code can use it if needed.) | |
registers: I, N, S, P | |
primitive string: cs | |
***************** | |
*constant values* | |
***************** | |
This describes what kind of constants can be used in mole code. | |
int: [1-9]\d* | |
float: ... | |
hex: 0[xX][0-9a-fA-F]+ | |
octal: 0[0-7]+ | |
string: "[...]" (with escapes) | |
************************ | |
*compile-time constants* | |
************************ | |
********************** | |
*working with strings* | |
********************** | |
Strings pretend to be 0-indexed. They actually also store their length and | |
encoding as the first five values. The length is stored as a 4B int and the | |
encoding is stored in one byte, with 3 unused bytes for padding. The string | |
for "hello, worlds?" would look as follows in memory: | |
0x0 0x4 0x8 0xA 0x10 0x14 | |
--------------------------------------------------------------------------------------------- | |
|0x0|0x0|0x0|0xC|0x0|0x0|0x0|0x1| h | e | l | l | o | , | | w | o | r | l | d | s | ? |\0 | | |
--------------------------------------------------------------------------------------------- | |
size encoding 0x0 0x4 0x8 0xA | |
********* | |
*structs* | |
********* | |
Structs may be defined as below. Once a struct is defined, it can be used | |
wherever any other type can be used. If a register is of a struct type, it is | |
assumed to point to a region of memory with the specified layout. Struct | |
members are accessed using the '->' notation, as in C. sizeof() can be used to | |
determine the number of bytes required by the struct. This is similar to C, | |
except that sizeof() is purely a compile-time construct and can not be used to | |
calculate the length of an array. | |
struct { | |
I int_thingy; | |
N n_thingy; | |
} struct_thingy; | |
var I quux; | |
var struct_thingy st; | |
st = m0::sys_alloc sizeof(struct_thingy); | |
st->int_thingy = 39292934; | |
st->n_thingy = 332.66; | |
******** | |
*chunks* | |
******** | |
Chunks are similar to functions. They have a constants table, a metadata table | |
and a bytecode segment. Values can be added to the constants table by | |
declaring a value with the keyword "const". Annotations may be added | |
automatically by the mole compiler and can also be added manually with the .ann | |
"key" "value" syntax. | |
chunk main (I a1, I a2, I a3) { | |
const I stdout 1; | |
const cs hello "ohai. im in ur m0"; | |
// annotation for the right file will be added by m1 compiler | |
m0::print_i stdout, hello; | |
var I i_thingy; | |
i_thingy += a3++; | |
c::fprintf(stdout, "asdfw %d\n", i_thingy); | |
call_chunk "chunk_name", arg_array; | |
} | |
********************* | |
*calling conventions* | |
********************* | |
I don't know. There are a couple options: | |
1) The first is that all calling conventions need to be dealt with explicitly. | |
This isn't nearly as bad as it'd be under M0 because of composed ops and it | |
would allow a very high degree of control without requiring the management of | |
all the minutae of the calling conventions more than once. | |
2) The second option is to have a default set of calling conventions that are | |
used with a simple minimalist syntax, but to allow them to be overridden with | |
composed ops. | |
3) The third option is to say that only the builtins can be used for control | |
flow. For a very experimental language like mole, this approach is probably | |
insane. | |
************** | |
*composed ops* | |
************** | |
mole supports syntax to create composed ops which behave similarly to built-in | |
M0 ops. The syntax is similar to chunnks with a few differences. Composed ops | |
are declared using the "composed" keyword and do not have return statements. | |
Any values that the composed op needs to modify should passed as arguments. | |
Using a return statement in composed op is a syntax error. Composed ops may | |
take an arbitrary number of arguments. Variables may be declared in composed | |
ops as in functions. composed ops are similar to inlined functions in C. | |
composed init_cf(P new_cf, I retpc_label) { | |
alloc_cf: | |
I cf_size = 256; | |
I flags = 0; | |
new_cf = m0::gc_alloc cf_size, flags; | |
init_cf_copy: | |
new_cf[INTERP] = cf[INTERP]; | |
new_cf[CHUNK] = cf[CHUNK]; | |
new_cf[CONSTS] = cf[CONSTS]; | |
new_cf[MDS] = cf[MDS]; | |
new_cf[BCS] = cf[BCS]; | |
new_cf[PCF] = cf[CF]; | |
new_cf[CF] = new_cf; | |
init_cf_zero: | |
new_cf[EH] = 0; | |
new_cf[RETPC] = 0; | |
new_cf[SPILLCF] = 0; | |
RETPC = retpc_label; | |
init_cf_pc: | |
new_cf[PC] = post_set; | |
CF = new_cf; | |
post_set: | |
} |
I really like the "fat" strings, with size information baked in. I'd also like similar features for arrays/pointers instead of raw pointer arithmetic, either classical fat pointers, or a memory view like Go's slices.
I think inline assembly is essential, I like soh-cah-toa's proposal.
Would it be desirable to rename composed ops to procedures and chunks to functions? That's vaguely what they're doing, and functions could easily be built on top of procedures. I don't know M0 very well, so there may be problems with this idea there.
call_chunk "chunk_name"
is a good idea iff chunk_name is actually a string. (Like how methods are looked up by name in Parrot today.)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I already mentioned these things to you, cotto, but I'm wondering what others may think as well so this is really directed at the other Parrot developers instead.
First, the syntax for allocating variables/registers may need to be distinguished. For instance:
At point A, it appears that you are declaring a named variable - like in PIR - that refers to an integer register. However, at point B, it looks like you are declaring a variable of type "struct_thingy". I say "variable" instead of "register" because there is no "stuct_thingy" register. Does the struct syntax define a new register type or is it merely an alternative syntax for referring to a group of registers?
Next is the syntax for constants. Take the following declaration:
The "const" statement creates a new entry in the symbol table. I'm wondering though what the 1 does in this statement. cotto says that it initializes the entry with that value. However, when you assign a value to anything else, like a register, you use the "=" operator. For the sake of consistency, I think it may be better to change the syntax to:
Lastly, the syntax for calling m0 opcodes is:
Would it be worth adding support for m0 blocks, much like Q:PIR {} in NQP? For instance,
Lastly, the syntax for calling a chunk.
This syntax kind of makes "chunk_name" look like a string rather than a function (chunk, whatever). This is something that's always annoyed me about the syntax for calling subroutines/methods in PIR. Why not keep it consistent with the way every other language uses for calling functions?
Or, if you wanted to be different, I think the alternate syntax that Perl 6 uses isn't too bad:
Give me your feedback.