Skip to content

Instantly share code, notes, and snippets.

@Havvy
Last active December 17, 2015 18:39
Show Gist options
  • Save Havvy/5654593 to your computer and use it in GitHub Desktop.
Save Havvy/5654593 to your computer and use it in GitHub Desktop.
Language specification.
Key features:
* Objects are just maps with overloaded operators.
* Invokables with closures, but only one calling form (Parens).
* No distinction between statements and expressions.
* Single binding source. (Invokables) -- this may change with dynamic bindings.
Syntactic Sugar
* Identifiers can have dashes.
* Named Primitives (i.e. re`\d{4}-\d{2}-\d{2}`, 4in, named-invokable [] {})
* Tail return (Return the NoneObject by ending with a semicolon)
* Primitives autobox seamlessly to the point it seems that everything is an object.
* Slicing (myList[1:5:2])
Unknown Stuff
* Dynamic variables. I.e. $this. (Would be useful for classes. $static($final('CONST', 42)))
* Try-Catch-Finally. There is `throw`,
= Magic =
== Scope ==
This standard defines the Magic programming language.
== Conformance ==
A conforming implementation of Magic must provide and support all the types,
values, objects, functions, and program syntax and semantics described in this
specification.
A conforming implementation of this Standard shall interpret characters in
conformance with the Unicode Standard, Version 5.1.0 or later and ISO/IEC
10646. If the adopted ISO/IEC 10646-1 subset is not otherwise specified, it is
presumed to be the Unicode set, collection 10646.
A conforming implementation of Magic is permitted to provide additional types, values, objects, and functions beyond those described in this specification.
A conforming implementation of Magic is permitted to support program syntax not described in this specification.
== Introduction ==
This document is about the Progamming Language Magic.
Magic's aim is to be a language where we can define scripting languages such
as Perl, JavaScript, Python, and Ruby.
This document was created by Havvy.
For purposes of laziness, this document is based in part, on the
[http://es5.github.io/ ECMAScript5 Specification]. Whole parts are grabbed
from the specification.
== Notational Convention ==
=== Syntactic and Lexical Grammars ===
==== Context-Free Grammars ====
A ''context-free grammar'' consists of a number of ''productions''. Each
production has an abstract symbol called a ''nonterminal'' as its
''left-hand side'', and a sequence of zero or more nonterminal and terminal
symbols as its ''right-hand side''. For each grammar, the terminal symbols are
drawn from a specified alphabet.
Starting from a sentence consisting of a single distinguished nonterminal,
called the ''goal symbol'', a given context-free grammar specifies a
''language'', namely, the (perhaps infinite) set of possible sequences of
terminal symbols that can result from repeatedly replacing any nonterminal in
the sequence with a right-hand side of a production for which the nonterminal
is the left-hand side.
==== The Lexical Grammar ====
A ''lexical grammar'' for Magic is given [[#???|below]]. This grammar has as
its terminal symbols characters (Unicode code units) that conform to the rules
for ''SourceCharacter'' defined [[#???|below]]. It defines a set of
productions, starting from the goal symbol InputElementDiv or
InputElementRegExp, that describe how sequences of such characters are
translated into a sequence of input elements.
Input elements other than white space and comments form the terminal symbols
for the syntactic grammar for Magic and are called Magic ''tokens''. These tokens are the reserved words, identifiers, literals, and punctuators of the Magic language. White space and single-line comments are discarded and do not appear in the stream of input elements for the syntactic grammar. A MultiLineComment (that is, a comment of the form “/*…*/” regardless of whether it spans more than one line) is likewise simply discarded if it contains no line terminator; but if a MultiLineComment contains one or more line terminators, then it is replaced by a single line terminator, which becomes part of the stream of input elements for the syntactic grammar.
Productions of the lexical and RegExp grammars are distinguished by having two colons `'''::'''` as separating punctuation. The lexical and RegExp grammars share some productions.
==== The Numeric String Grammar ====
Another grammar is used for translating Strings into numeric values. This grammar is similar to the part of the lexical grammar having to do with numeric literals and has as its terminal symbols ''SourceCharacter''. This grammar appears [[#???|below]].
Productions of the numeric string grammar are distinguished by having three colons `''':::'''` as punctuation.
==== The Syntactic Grammar ====
The syntactic grammar for Magic is given in clauses ???. This grammar has Magic tokens defined by the lexical grammar as its terminal symbols ([[#???|below]]). It defines a set of productions, starting from the goal symbol [[#???|Program]], that describe how sequences of tokens can form syntactically correct Magic programs.
When a stream of characters is to be parsed as an Magic program, it is first converted to a stream of input elements by repeated application of the lexical grammar; this stream of input elements is then parsed by a single application of the syntactic grammar. The program is syntactically in error if the tokens in the stream of input elements cannot be parsed as a single instance of the goal nonterminal Program, with no tokens left over.
Productions of the syntactic grammar are distinguished by having just one colon `''':'''` as punctuation.
The syntactic grammar as presented in clauses 11, 12, 13 and 14 is actually not a complete account of which token sequences are accepted as correct Magic programs. Furthermore, certain token sequences that are described by the grammar are not considered acceptable if a terminator character appears in certain “awkward” places.
==== Grammar Notation ====
See section 5.1.6 of the ECMAScript 5 specification.
===== Differences =====
# Other productions are surrounded by '`' quotes.
# This specification puts _opt outside the other productions instead of subscripting opt.
# This specification doesn't do anything special where they bold/green-text.
=== Algorithm Conventions ===
See section 5.2 of the ECMAScript 5 specification.
== Source Text ==
See section 6 of the ECMAScript 6 specification.
Note that Magic uses UTF-8 instead of UTF-16.
== Lexical Conventions ==
=== Unicode Format-Control Characters ===
=== Line Terminals ===
Line terminals are used for strings, comments, and whitespace.
LineTerminal ::
<LF>
<CR>
=== Comments ===
There are single line comments, starting with '//'. While this could be an
operator, I can't think of a better token.
Comment ::
// `SingleLineCommentChars`_opt `LineTerminal`
CommentChars ::
`SingleLineCommentChar` `SingleLineCommentChars`_opt
CommentChar ::
`SourceCharacter` but not `LineTerminal`
Multiline comments do not exist. Use multiline strings for that.
=== White Space ===
Compared to ECMAScript, Magic eschews quite a lot of whitespace characters.
WhiteSpace ::
Comment
LineTerminal
<SP>
<Tab>
<BOM>
=== Tokens ===
Token ::
`IdentifierName`
`Punctuator`
`NumericLiteral`
`StringLiteral`
=== Punctuators ===
Punctuator :: one of
{ } ( ) [ ] . ; : #
=== Identifiers ===
<copy paste from ES5 spec>
Identifier Names are tokens that are interpreted according to the grammar given in the “Identifiers” section of chapter 5 of the Unicode standard, with some small modifications. An Identifier is an IdentifierName that is not a ReservedWord (see 7.6.1). The Unicode identifier grammar is based on both normative and informative character categories specified by the Unicode Standard. The characters in the specified categories in version 3.0 of the Unicode standard must be treated as in those categories by all conforming ECMAScript implementations.
This standard specifies specific character additions: The dollar sign ($) and the underscore (_) are permitted anywhere in an IdentifierName.
Unicode escape sequences are also permitted in an IdentifierName, where they contribute a single character to the IdentifierName, as computed by the CV of the UnicodeEscapeSequence (see 7.8.4). The \ preceding the UnicodeEscapeSequence does not contribute a character to the IdentifierName. A UnicodeEscapeSequence cannot be used to put a character into an IdentifierName that would otherwise be illegal. In other words, if a \ UnicodeEscapeSequence sequence were replaced by its UnicodeEscapeSequence's CV, the result must still be a valid IdentifierName that has the exact same sequence of characters as the original IdentifierName. All interpretations of identifiers within this specification are based upon their actual characters regardless of whether or not an escape sequence was used to contribute any particular characters.
Two IdentifierName that are canonically equivalent according to the Unicode standard are not equal unless they are represented by the exact same sequence of code units (in other words, conforming ECMAScript implementations are only required to do bitwise comparison on IdentifierName values). The intent is that the incoming source text has been converted to normalised form C before it reaches the compiler.
ECMAScript implementations may recognize identifier characters defined in later editions of the Unicode Standard. If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0.
Syntax
Identifier ::
`IdentifierName` but not `ReservedWord`
IdentifierName ::
`IdentifierStart`
`IdentifierName IdentifierPart`
IdentifierStart ::
`UnicodeLetter`
$
_
\ `UnicodeEscapeSequence`
IdentifierPart ::
`IdentifierStart`
`UnicodeCombiningMark`
`UnicodeDigit`
`UnicodeConnectorPunctuation`
`<ZWNJ>`
`<ZWJ>`
UnicodeLetter ::
any character in the Unicode categories “Uppercase letter (Lu)”, “Lowercase letter (Ll)”, “Titlecase letter (Lt)”, “Modifier letter (Lm)”, “Other letter (Lo)”, or “Letter number (Nl)”.
UnicodeCombiningMark ::
any character in the Unicode categories “Non-spacing mark (Mn)” or “Combining spacing mark (Mc)”
UnicodeDigit ::
any character in the Unicode category “Decimal number (Nd)”
UnicodeConnectorPunctuation ::
any character in the Unicode category “Connector punctuation (Pc)”
UnicodeEscapeSequence
see [[#???|below]]
==== Reserved Words ====
A reserved word is an ''IdentifierName'' that cannot be used as an ''Identifier''.
ReservedWord :: one of
if else loop match in delete
==== Symbol Literals ====
Symbol ::
: `IdentifierName`
=== Numeric Literals ===
Number ::
`digit` `Identifier`_opt
=== String Literals ===
String ::
`SingleQuoteString`
`DoubleQuoteString`
`NamedString`
SingleQuoteString ::
' `SingleQuoteContents` '
DoubleQuoteContents :;
" `DoubleQuoteContents` "
NamedString ::
`Identifier`_opt ` `NamedStringContents` ` `NamedStringFlags`_opt
SingleQuoteContents ::
`SourceCharacter` but not single quote ' or `LineTerminal`
DoubleQuoteContents ::
`SourceCharacter` but not double qoute " or `LineTerminal`
QuasiStringContents ::
It's complicated???
== Types ==
Algorithms within this specification manipulate values each of which has an associated type. The possible value types are exactly those defined in this clause. Types are further subclassified into Magic language types and specification types.
A Magic language type corresponds to values that are directly manipulated by
a Magic programmer using the Magic language. The Magic language types are
[[#symbols|symbols]], [[#ints|ints]], [[#floats|floats]], [[#invokables|invokables]], [[#strings|strings]], [[#wrapped primitives|wrapped
primitives]], [[#maps|maps]], and [[#objects||objects]].
Magic language types are further subdivided into two types. ''Primitive
types'' are [[#symbols|symbols]], [[#ints|ints]], [[#floats|floats]],
[[#invokables|invokables]], and [[#strings|strings]]. ''Complex types'' are
[[#wrapped primitives|wrapped primitives]], [[#maps|maps]] and
[[#objects||objects]].
Within this specification, the notation “Type(x)” is used as shorthand for “the type of x” where “type” refers to the Magic language and specification types defined in this clause.
=== Symbols ===
The Symbol type is a one to one mapping from the set of all identifers. The Symbol type is generally used to represent enumerable data in a running Magic program. For example, booleans are represented using the :true and :false symbols.
=== Integral Numbers ===
A basic 64 bit integer.
TODO
=== Floating Point Numbers ===
A basic 64 bit floating point value.
TODO
=== Strings ===
The String type is the set of all finite ordered sequences of zero or more 8-bit unsigned integer values (“elements”). The String type is generally used to represent textual data in a running Magic program, in which case each element in the String is treated as a UTF-8 code unit value. Each element is regarded as occupying a position within the sequence. These positions are indexed with nonnegative integers. The first element (if any) is at index 0, the next element (if any) at index 1, and so on. The length of a String is the number of elements (i.e., 8-bit values) within it. The empty String has length zero and therefore contains no elements.
Where Magic operations interpret String values, each element is interpreted as a single UTF-8 code unit. However, Magic does not place any restrictions or requirements on the sequence of code units in a String value, so they may be ill-formed when interpreted as UTF-8 code unit sequences. Operations that do not interpret String contents treat them as sequences of undifferentiated 8-bit unsigned integers. No operations ensure that Strings are in a normalized form. Only operations that are explicitly specified to be language or locale sensitive produce language-sensitive results
NOTE The rationale behind this design was to keep the implementation of Strings as simple and high-performing as possible. If Magic source code is in Normalised Form C, string literals are guaranteed to also be normalised, as long as they do not contain any Unicode escape sequences.
Some operations interpret String contents as UTF-8 encoded Unicode code points.
In that case the interpretation is:
TODO
=== invokable ===
Basic function. Not to be confused with the core-lib Fn.
==== Synax ====
<code>bindings -> { body }</code>
<code>self-binding [parameter, list = default, ...rest | local, bindings = default] -> {
body
}</code>
The final expression is the return value.
{{aside|As a side effect of the [[#Semicolon Rule|semicolon rule]], ending an
invokable with a semicolon will cause it to return the None object.}}
==== Bindings ====
The invokable is the only way to introduce new static bound variables. The
bindings list (which looks like a corelib::List syntactically) is divided into
two sublists by the pipe key.
The left list is the parameter list. It behaves as parameter lists do in
ECMAScript 6. This includes default arguments, destructuring, and the ellipsis
thing.
The right side is the local bindings list. The local bindings list allows for
default arguments, but destructuring and the ellipsis thing do not make sense
here.
If there are no local bindings, the pipe key may (and should) be omitted.
Bindings are available until the end of the invokable. Bindings can also be
closed over by other invokables.
=== Maps ===
A ''map'' is a bag of key-value pairs where each key is unique. The keys and
values are any Magic language values.
=== object ===
An object is a tuple of two maps, an arbitrary ''contents map'' and an
''operations map'' with the following keys, containing invokables:
get [key, reason] NB: reason is either :call or :value
slice [start, end, mod]
set [key, value]
delete [key]
has [key]
in [key]
invoke [...args]
enumerate []
truthy []
compare [comparee]
== Autoboxing ==
Primitive values are automatically boxed into objects as needed.
=== String ===
<code>
StringMethods := Prototype(ListMethods)
StringOps := {
{
get : [key, reason] -> {
if
},
slice : [start = 0, end = $contents., mod = 1 | copy = {}] -> {
$contents.keys()[start : end].each([value] ->
copy[value] = $contents[value];
})
}
set : [key, value] -> {
$contents.contents[key] = value
},
delete : [key] -> {
delete key in $contents.contents
},
has : [key] -> {
key in $contents.contents
},
in : [key] -> {
key in $contents.contents or key in $contents.prototype
},
invoke : [new-contents] -> {
Prototype($this, new-contents)
},
keys : [] -> {
$contents.contents.keys().union($contents.prototype.keys())
},
truthy : [] -> {
true
}
compare : [comparee] -> {
throw TypeError("Prototypical objects are not comparable.")
}
}
}
String := [string] => {
return Object()
}
== Blocks and Expressions ==
Expressions are prefixed by a semicolon. The initial expression in a block may
omit this semicolon. It is the developer's perogative
Block :
`BlockType` { `Expression` }
Expression :
`Block`
`NoneExpression`
`StatementExpression`
`InvocationExpression`
`OperationExpression`
`UnaryExpression`
`BinaryExpression`
=== NoneExpression ===
NoneExpression :
[Empty]
Always evaluates to the None object.
=== StatementExpression ===
StatementExpression :
`Expression` ; `Expression`
Used for side effects and intermeditary calculation. Evaluates to the final
expression. The left side of a statement expression is always evaluated before
the right.
NB: Even though this is called a StatementExpression, there is no distinction
between a statement and an expression. For example, the following expression
is legal:
<code>a := (b.flag = :true; b.method()); a.x</code>
=== Invocation Expression ===
InvocationExpression :
`Identifer` `Call`
Call :
( `FirstArgument` `RestArguments` )
FirstArgument :
`Expression`
RestArguments :
, `Expression` `Arguments`_opt
TODO (Semantics)
=== Operation Expression ===
OperationExpression :
`PropertyGetExpression`
`PropertyCallExpression`
`PropertySliceExpression`
==== Property Expression ===
PropertyExpression :
`Identifer` [ `Expression` ]
`Identifer` . `Identifer`
The first identifer is the ''from value''.
The expression is the ''what value''. For the dot case, the what value is the
string of the literal value of the identifier. For example, with Obj.prop, the
what value is "prop".
==== Property Get Expression ====
PropertyGetExpression:
`PropertyExpression`
The evaluation of this expression follows the following algorithm:
# If Type(from) is in [symbol, int, float, or invokable], then from becomes Box(from).
# If Type(from) is string, and what is not "length", then from becomes Box(from).
# If Type(from) is map, execute and return the following subalgorithm.
## If from has a property with a key equal to the what value, return that property's value value.
## Otherwise return the None object.
# If Type(from) is Object, return the value of OperationsCall(from, 'get', [what]).
# Otherwise, terminate the program.
==== Property Call Expression ====
PropertyCallExpression :
`PropertyExpression` `Call`
TODO (Semantics)
==== Property Slice Expression ====
PropertySliceExpression :
`Identifer` [ `SliceInnards` ]
SliceInnards :
`Expression` : `SliceRest`
SliceRest :
`Expression`
`Expression` : `Expression`
TODO (Better names and Semantics)
=== Binary Expression ===
BinaryExpression :
`RebindingExpression`
`PropertySetExpression`
`BinaryFunctionExpression`
==== Rebinding Expression ====
RebindExpression :
`Identifier` := `Expression`
TODO
==== Property Set Expression ====
<code>O[P] = v'</code>
TODO
=== Unary Expression ===
==== Typeof Expression ====
TypeofExpression :
typeof `expression`
The typeof expression looks at the type of the object, and returns the symbol
of the name of the type in all lowercase. The possible values are:
* :symbol
* :int
* :float
* :string
* :invokable
* :map
* :object
=== Delete Expression ===
DeleteExpression :
delete `PropertyExpression`
=== If Expression ===
<code>if (predicate) { consequent }</code>
<code>if (predicate) { consequent } else { ... }</code>
<code>if (predicate) { consequent } else `If Expression`</code>
==== Truthiness ====
All symbols other than :false are truthy.
All numbers other than NaN, +0, and -0 are truthy.
All strings other than the empty string are truthy.
All invokables are truthy.
All maps are truthy.
Objects are truthy if the result of calling the truthy invokable is truthy.
<code>
truthy := truthy [value | type = typeof value] -> {
if (type == :symbol) {
value == :true
} else if (type == :number) {
value != 0
} else if (type == :string) {
value != ""
} else if (type == :object) {
truthy(Operations(value).truthy.call($contents = Contents(value)))
} else {
:true
}
}
</code>
[|Prototype] -> {
Prototype := [|operations] -> {
[proto, contents] -> {
Object(operations {
prototype: proto,
contents: contents
})
}
}({
get : [contents, key, reason] -> {
if key in contents.contents {
contents.contents[key]
} else {
prototype.contents[key]
}
},
slice : [contents, start = String.min, end = String.max, mod = 1|copy = {}] -> {
contents.keys()[start : end].each([value] ->
copy[value] = contents[value];
})
}
set : [contents, key, value] -> {
contents.contents[key] = value
},
delete : [contents, key] -> {
delete key in contents.contents
},
has : [contents, key] -> {
key in contents.contents
},
in : [contents, key] -> {
key in contents.contents or key in contents.prototype
},
invoke : [contents, ...args] -> {
throw TypeError("Prototypical objects are not callable.")
},
keys : [contents] -> {
contents.contents.keys().union(contents.prototype.keys())
},
truthy : [contents] -> {
true
}
compare : [contents, comparee] -> {
throw TypeError("Prototypical objects are not comparable.")
}
})
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment