Skip to content

Instantly share code, notes, and snippets.

@kalexmills
Last active December 14, 2019 23:53
Show Gist options
  • Save kalexmills/701bde32c5a6fe3dfed4f02e17869a49 to your computer and use it in GitHub Desktop.
Save kalexmills/701bde32c5a6fe3dfed4f02e17869a49 to your computer and use it in GitHub Desktop.
Gennit README

Gennit

Simple DSL for stochastic generative grammars, written in Scala.

Getting Started

Use one of the below two commands to start the REPL.

$ bloop run root
$ sbt run

You can do two things in the REPL. Add / update rules of the grammar, and evaluate expressions.

Adding rules

Rules in Gennit are written using a mostly self-explanatory BNF-like syntax.

The below rule generates the language consisting of one or more 'a' literals. Tokens surrounded by angle-braces are called "tags". Rules must terminate in semi-colons.

>>> [A] -> a[A] | a;

Context-sensitive rules work as well. The below rule creates a [B] from two as.

>>> a a -> [B];

Unlike in string-based languages, contexts in Gennit are multisets (bags) of tags and literals. When evaluating whether a rule applies to a given context, order is unimportant. If a matching number of tokens are present in a context, the rule can be applied.

When multiple rules can be applied, Gennit selects the rule to apply uniformly at random.

Evaluating contexts

To evaluate a context, type a sequence of tokens. Any rules previously entered will be applied, until the system arrives at a state where no further rule can be applied (as usual, the programmer is responsible for avoiding infinite loops). The resulting context is printed to the output.

>>> [A]
a

When a rule is applied, one of the choices on its left-hand side is selected stochastically. By default, the choices are selected uniformly at random.

>>> [A] -> a | b | c;

>>> [A]
a
>>> [A]
c

Weighted rules

The right-hand side (RHS) of a rule can be weighted via a fractional value. When evaluating these rules, the probability of selecting a given choice on the RHS is equal to its weight divided by the sum of all weights appearing in the rule.

>>> [A] -> 1.5 a | 5 b | 1.5 c;

>>> [A]
b
>>> [A]
b

In the above rule, a and c are selected with probability 3/16, while b is selected with probability 5/8.

If any one of the choices on the RHS of a rule is weighted, all of its choices must be weighted, to avoid a parse error.

Extending rules

Only one rule with an equivalent left-hand side (LHS) can exist at any given time. If a new rule is inserted whose LHS matches the LHS of an existing rule, its RHS is extended to include the results from the previous rule.

>>> [A] -> a;

>>> [A]
a
>>> [A] -> b;

>>> [A]
b
>>> [A]
a

When a weighted rule is extended with an unweighted rule, the tokens on the RHS of the unweighted rule are treated as though they have an implicit weight of 1.0.

>>> [A] -> 2 a | 1 b;

>>> [A] -> a;

After both rules have been added, the probability of producing a from [A] is 3/4, while the probability producing b is 1/4.

Perma Tags

Perma Tags are never consumed when used. A permatag is written as [a!].

>>> [a!] [b] -> c

Once a Perma Tag is produced, it remains a part of the state forever. This implies that any rule which contains only Perma Tags on its RHS will always be valid. Since this would create an infinite loop, such rules are explicitly disallowed.

>>> [a!] -> b
ERROR: rules containing only permatags on their lefthand side produce infinite loops and are not allowed.

Perma Tags can be used to condition rules on the existence of a tag, as in the below.

>>> [a!] [b] -> c

>>> [b]
[b]
>>> [a!][b]
[a!] c

Technically, any rule involving permatags can be written without them. For instance the above rule could be written as [a][b] -> [a]c and the same value would be produced. But this would allow [a] to be accidentally be consumed by some other rule. Perma Tags come with a guarantee that, once generated, they can never be consumed, accidentally or otherwise.

Tuples

Rules can use and generate Tuples. A Tuple is a list of one or more sets of tokens separated by commas and surrounded by parentheses. Each set of tokens within a tuple is expanded separately, using whichever rules are in scope.

>>> [a] -> a;

>>> [b] -> b;

>>> [x] -> ([a], [b] c);

>>> [x]
(a,b c)

Tuples can be nested, but take care to avoid infinite loops.

>>> [x] -> ([a], [x]) | [a] | [b]; 

>>> [x]
(a,(a,b c))
>>> [x]
(a,(a,(a,b c)))

Tuples can also appear on the LHS of a rule.

>>> (a,b c) -> abc;

>>> [x]
(a,(a,abc))

There is no limit on how many sets of tokens may appear in a tuple.

Named Tuples

To help distinguish to which tuples a rule should be applied, tuples may be tagged with a name. The expression foo(a,b) forms a named tuple, where the name is foo and the tuple being named is (a,b). Names must be

>>> [x] -> foo([arg], [arg]);

>>> [arg] -> bar | baz;

>>> [x]
foo(bar,baz)
>>> foo(bar,bar) -> FOOBAR;

>>> [x]
foo(bar,baz)
>>> [x]
FOOBAR

Count Tags

Count Tags can be used to create finite resource pools which can be consumed. A Count Tag is written as [a:] where # is an integer.

>>> [a:3]
[a:3]

When a Count Tag appears on the left-hand side (LHS) of a rule, it is only applied if the context has a matching Count Tag with a numeric value greater-than or equal to that required by the rule.

>>> [a:3] -> b;

>>> [a:2]
[a:2]
>>> [a:3]
b
>>> [a:4]
b [a:1]

As can be seen from the last invocation in the above example, any left-over value is preserved in the context.

Technically speaking, CountTags do not allow you to achieve anything you could not otherwise achieve via multiple tags.

Two-way Rules

Two-way rules allow for the terse creation of graphs for random walks over finite state machines. Double are added using <->, to produce rule bundles. A two-way rule reading [y] <-> [a] | [b] adds the four rules [y] -> [a], [y] -> [b], [b] -> [y], [a] -> [y].

Random walks only work if there are terminal nodes in the graph. Gennit does not yet optimize random walks. If your graph is very large, it may take a long time to reach a terminal state. It goes without saying that random walks are not guaranteed to terminate on all rule sets.

Restrictions on Names

Names can appear in multiple contexts in Gennit. In literals, as names of tags and of tuples. In each context, names must begin with an alphabetical character or a _, and contain only alphanumeric characters underscores (_) and hyphens (-).

REPL Commands

The below commands are available in the REPL.

Command Description
\file [filename] load all rules from the provided file.
\trace toggle tracing on/off.
\clear remove all rules from the context.
\exit exit the REPL.

Formal Properties

Nothing Gennit does prevents the user from entering a grammar which can be used to recognize a Turing-complete language1. This means the programmer is responsible for ensuring that their grammar terminates with probability 1. While a few dumb mistakes are checked for, solving the general problem of checking whether a set of rules can produce an infinite loop is likely equivalent to solving the Halting Problem.

Because of this, Gennit is very likely powerful enough to produce anything that can be randomly generated. The only reason I'm using "likely" in the above is that contexts are represented as an unordered bag of tokens, rather than as a string of characters, as is typical in string rewriting systems.

But strictly speaking, Gennit doesn't produce anything on its own other than a multiset of tokens. The programmer is then responsible for converting these tokens into an object of the desired form. That usually requires the use of a separate programming language.

1 This statement has yet to be formally proved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment