kalexmills/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Gennit

Simple DSL for stochastic generative grammars, written in Scala.
Getting Started

Use one of the below two commands to start the REPL.
$ bloop run root
$ sbt run

You can do two things in the REPL. Add / update rules of the grammar, and evaluate expressions.
Adding rules

Rules in Gennit are written using a mostly self-explanatory BNF-like syntax.
The below rule generates the language consisting of one or more 'a' literals. Tokens surrounded
by angle-braces are called "tags". Rules must terminate in semi-colons.
>>> [A] -> a[A] | a;

Context-sensitive rules work as well. The below rule creates a [B] from two as.
>>> a a -> [B];

Unlike in string-based languages, contexts in Gennit are multisets (bags) of tags and literals.
When evaluating whether a rule applies to a given context, order is unimportant. If a matching
number of tokens are present in a context, the rule can be applied.
When multiple rules can be applied, Gennit selects the rule to apply uniformly at random.
Evaluating contexts

To evaluate a context, type a sequence of tokens. Any rules previously entered will be applied,
until the system arrives at a state where no further rule can be applied (as usual, the
programmer is responsible for avoiding infinite loops). The resulting context is printed to
the output.
>>> [A]
a

When a rule is applied, one of the choices on its left-hand side is selected stochastically.
By default, the choices are selected uniformly at random.
>>> [A] -> a | b | c;

>>> [A]
a
>>> [A]
c

Weighted rules

The right-hand side (RHS) of a rule can be weighted via a fractional value. When evaluating
these rules, the probability of selecting a given choice on the RHS is equal to its weight
divided by the sum of all weights appearing in the rule.
>>> [A] -> 1.5 a | 5 b | 1.5 c;

>>> [A]
b
>>> [A]
b

In the above rule, a and c are selected with probability 3/16, while b is selected with
probability 5/8.
If any one of the choices on the RHS of a rule is weighted, all of its choices must be weighted,
to avoid a parse error.
Extending rules

Only one rule with an equivalent left-hand side (LHS) can exist at any given time. If a new
rule is inserted whose LHS matches the LHS of an existing rule, its RHS is extended to include
the results from the previous rule.
>>> [A] -> a;

>>> [A]
a
>>> [A] -> b;

>>> [A]
b
>>> [A]
a

When a weighted rule is extended with an unweighted rule, the tokens on the RHS of the
unweighted rule are treated as though they have an implicit weight of 1.0.
>>> [A] -> 2 a | 1 b;

>>> [A] -> a;

After both rules have been added, the probability of producing a from [A] is 3/4, while
the probability producing b is 1/4.
Perma Tags

Perma Tags are never consumed when used. A permatag is written as [a!].
>>> [a!] [b] -> c


Once a Perma Tag is produced, it remains a part of the state forever. This implies that any rule
which contains only Perma Tags on its RHS will always be valid. Since this would create an infinite
loop, such rules are explicitly disallowed.
>>> [a!] -> b
ERROR: rules containing only permatags on their lefthand side produce infinite loops and are not allowed.

Perma Tags can be used to condition rules on the existence of a tag, as in the below.
>>> [a!] [b] -> c

>>> [b]
[b]
>>> [a!][b]
[a!] c

Technically, any rule involving permatags can be written without them. For instance the above rule could
be written as [a][b] -> [a]c and the same value would be produced. But this would allow [a] to be
accidentally be consumed by some other rule. Perma Tags come with a guarantee that, once generated, they
can never be consumed, accidentally or otherwise.
Tuples

Rules can use and generate Tuples. A Tuple is a list of one or more sets of tokens separated by commas
and surrounded by parentheses. Each set of tokens within a tuple is expanded separately, using whichever
rules are in scope.
>>> [a] -> a;

>>> [b] -> b;

>>> [x] -> ([a], [b] c);

>>> [x]
(a,b c)

Tuples can be nested, but take care to avoid infinite loops.
>>> [x] -> ([a], [x]) | [a] | [b]; 

>>> [x]
(a,(a,b c))
>>> [x]
(a,(a,(a,b c)))

Tuples can also appear on the LHS of a rule.
>>> (a,b c) -> abc;

>>> [x]
(a,(a,abc))

There is no limit on how many sets of tokens may appear in a tuple.
Named Tuples

To help distinguish to which tuples a rule should be applied, tuples may be tagged with a name.
The expression foo(a,b) forms a named tuple, where the name is foo and the tuple being named
is (a,b). Names must be
>>> [x] -> foo([arg], [arg]);

>>> [arg] -> bar | baz;

>>> [x]
foo(bar,baz)
>>> foo(bar,bar) -> FOOBAR;

>>> [x]
foo(bar,baz)
>>> [x]
FOOBAR

Count Tags

Count Tags can be used to create finite resource pools which can be consumed. A Count Tag is written
as [a:] where # is an integer.
>>> [a:3]
[a:3]

When a Count Tag appears on the left-hand side (LHS) of a rule, it is only applied if the context
has a matching Count Tag with a numeric value greater-than or equal to that required by the rule.
>>> [a:3] -> b;

>>> [a:2]
[a:2]
>>> [a:3]
b
>>> [a:4]
b [a:1]

As can be seen from the last invocation in the above example, any left-over value is preserved
in the context.
Technically speaking, CountTags do not allow you to achieve anything you could not otherwise
achieve via multiple tags.
Two-way Rules

Two-way rules allow for the terse creation of graphs for random walks over finite state machines.
Double are added using <->, to produce rule bundles. A two-way rule reading
[y] <-> [a] | [b] adds the four rules [y] -> [a], [y] -> [b], [b] -> [y],
[a] -> [y].
Random walks only work if there are terminal nodes in the graph. Gennit does not yet optimize
random walks. If your graph is very large, it may take a long time to reach
a terminal state. It goes without saying that random walks are not guaranteed to terminate
on all rule sets.
Restrictions on Names

Names can appear in multiple contexts in Gennit. In literals, as names of tags and of tuples.
In each context, names must begin with an alphabetical character or a _, and contain only
alphanumeric characters underscores (_) and hyphens (-).
REPL Commands

The below commands are available in the REPL.


Command
Description


\file [filename]
load all rules from the provided file.


\trace
toggle tracing on/off.


\clear
remove all rules from the context.


\exit
exit the REPL.


Formal Properties

Nothing Gennit does prevents the user from entering a grammar which can be used to recognize
a Turing-complete language¹. This means the programmer is
responsible for ensuring that their grammar terminates with probability 1. While a few dumb mistakes
are checked for, solving the general problem of checking whether a set of rules can produce an
infinite loop is likely equivalent to solving the Halting Problem.
Because of this, Gennit is very likely powerful enough to produce anything that can be randomly generated.
The only reason I'm using "likely" in the above is that contexts are represented as an unordered bag of
tokens, rather than as a string of characters, as is typical in string rewriting systems.
But strictly speaking, Gennit doesn't produce anything on its own other than a multiset of tokens.
The programmer is then responsible for converting these tokens into an object of the desired form.
That usually requires the use of a separate programming language.
¹ This statement has yet to be formally proved.
Command	Description
`\file [filename]`	load all rules from the provided file.
`\trace`	toggle tracing on/off.
`\clear`	remove all rules from the context.
`\exit`	exit the REPL.