Skip to content

Instantly share code, notes, and snippets.

@porky11
Last active February 12, 2019 13:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save porky11/3a3113e8aac98070f80182275577ccfe to your computer and use it in GitHub Desktop.
Save porky11/3a3113e8aac98070f80182275577ccfe to your computer and use it in GitHub Desktop.
Natural Language Programming

Introduction

It's interesting to think about making programming lanugages more like natural languages. The problems with most existing approaches is, the langauge designers try to stick to grammatical structures of natural languages, but don't implement these structures into code directly. Instead they just implement a pretty simple program, which is basically just a traditional programming language, but with a mostly more verbose syntax of the natural language.

A more reasonable approach would be to define a programming language and only add benefits from natural language. The concepts from the natural langauges should be mapped to the programming language in a simple way. The english language will be used as reference, when needed.

So let's just start with the language design.

Langage

Basic language

The basic language will just have some standard programming concepts. The exact syntax won't be discussed yet.

The program is basically a chain of function calls, which may mutate state. Some function calls may be nested and programs may branch and loop.

Functions can take and return values. It's possible to define functions oneself.

Values have types and it's also possible to define new types.

Most things, like functions, values and types, can be represented by names. Like in most programming languages, english words will be used for the names.

Replacing parentheses

In natural langauges, it's often possible to create complex nested grammatical structures without the need of parentheses. Instead there are some words which implicitly open or close parentheses. The exact nestings may be ambiguous, but these cases shouldn't be difficult to avoid.

A simple way to achieve this is to just add different kinds of words to the language, some that implicitly open parentheses and some that implicitly close them. The words opening parentheses are relative objects. They can represent functions.

So there are two important kinds of words:

  • absolute => implicitly close parentheses
  • relative => implicitly open parentheses

There may be more kinds of words, which do different things. Some are described later.

Function argument order

Function may map to many things in natural languages, but they map best to actions, especially to orders. In most languages the order of most arguments is arbitrary. Instead the usage of objects can be determined by the word kind (he, quickly, high), the case (he, him) and some prepositions (from, to).

Instead of different ways to determine the usage, there should be just one way. Because the objects, represented by words, already have types, it seems most useful to determine them by types. Types are most similar to the word kind. Using keywords would be a more useful way to handle prepositions and cases could also easily be handled by keywords instead. But keywords can also be interpreted as generic functions, which generate a new generic type. This way it's also possible to apply multiple objects of similar types to a function in arbitrary order.

Avoiding temporary variables

Another thing, that's often unnecessary in natural languages, are temporary names for objects, like temporary variables in programming languages. Instead there are other ways to query recently used objects.

The most obvious way is the usage of pronouns. There may be different kinds of pronouns, but all of them map to the most recently used object of some kind. In the english langauge it's normally a subject of a specific gender, but in case the last sentence didn't have a subject of the matching gender, it may also be a different kind of object. There are no exact rules for matching the pronouns.

This language should not contain multiple pronouns, based on random attributes like gender. But it's also not easy to find a good way to have a general pronoun. Just mapping to the last object mentioned seems rather useless and unflexible, mapping to the last subject is also not easy, since the sentences don't require to contain subjects, if the concept of subjects exists at all.

But natural languages also have other techniques to reference recently used objects. It's mentioning an object, which has specific properties. English uses the <properties>, which seems useful for this language, too.

Instead of properties, which are difficult to find out at compile time, it seems more useful to use a type instead. The type can be a specific type, but also an abstract type. So the <type> will be the syntax for referencing the most recently used object of a specific type. It always just take a single argument, so it may be reasonable not to implement it as a relative word, but something else, so it's simple to replace the ... by a name without having to worry about the implicit parentheses.

When talking about multiple objects of the same type, instead of explicitly naming things, it's also possible to wrap them into different wrapper types (the big person, the small person (assuming big and small are relative words, which construct generic types))

Suffixes

The meaning of some words in natural languages can easily be modified using pre- and suffixes. For simplification, only suffixes will be used in the programming language.

A suffix is a new word kind, which can be written behind any other word, changing its meaning. When using suffixes instead of prefixes, it's easier to chain them together. So you already know the word, which will be modified, and every modifier will change the meaning of the recently read word. It should be easier to understand for readers/persons and parsers/compilers this way. This introduces a new word kind, which will be called modifier, since it modifies the menaing of words.

A suffix can be used for different kinds of words, at least for absolute and relative words, and can even change the word kind (convert between them).

For example, there may be a un suffix, which converts absolute words A to not(A) and relative words R to not(R(...)).

Getting rid of named struct fields

Also structs don't need to contain fields. Instead they also can just contain a set of types. In order to make a struct containing multiple of the same type, a generic wrapper type is required. In order to get the type of a struct, you could use the modifier of after a type, which will convert the type into a relative getter function (the int of object, the int of size of array).

Functions also don't need to specifiy argument names, argument types or return values explicitly. Instead it can be seen as taking just one struct called this, representing the arguments. So like for structs, it's not allowed to have arguments of the same type. The struct can be a generic type, which will be checked at compile time, if it contains all required subtypes, when supplied to a function. So the exact specialization of the function will be created, when calling it with a specific type.

Examples

The structure of the examples is the following:

  • English => Programming Language => With parentheses

  • Explanations

Simple sentences

Peter quickly goes from A to B => go from A to B so quick ly Peter => go(from(A) to(B) so(ly(quick)) Peter)

Absolute words here: Peter, A, B, quick + ly Relative words here: go, from, to, so, quick, Modifiers here: ly

Only one form for each object is used. Adverbs are described using adjectives. In this case so is used to specifiy the quality of something, combined with an adverb. It's preferred, when types only take a single argument, which is not instantiated by using a keyword.

A similar example could look like this:

The quick Peter goes from A to B => go from A to B quick Peter o => go(from(A) to(B) quick(Peter))

In this case, Peter cannot close the bracket, so the explicit bracked o is needed. Instead there could be words (for example modifiers) specifying, that a relative word takes exactly one argument (the same word kind as the).

Defining a function

Use cases

The main goal is for the language to be used as a language, that is as well speakable as programmable. But even if this goal cannot be reached, there are a few special use cases, when non programmers need to write or talk to a program. For example it would allow to tell some persons in a strategy game, what to do, and compile it efficiently.

Conclusion

Some of the concepts may be confusing for a programming language, especially the implicit brackets, but if done in a smart way, this could improve languages. Especially implicitly named variables and fields using types seem to be usable for real programming pretty easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment