Skip to content

Instantly share code, notes, and snippets.

@primaryobjects
Created May 8, 2019 20:16
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save primaryobjects/3d536ff6e6ef68ab7658a823f581dcf4 to your computer and use it in GitHub Desktop.
Save primaryobjects/3d536ff6e6ef68ab7658a823f581dcf4 to your computer and use it in GitHub Desktop.
Programming by Example proof-of-concept implementation with a machine learning model based on input/output features.

Programming By Example

The following program is a basic proof-of-concept implementation of the program synthesis technique of Programming by Example, as included in Microsoft Excel FlashFill.

The Data Set

The data-set "features.csv" consists of extracted features from input/output examples, as a user would provide prior to beginning program synthesis. For example, to produce a program that extracts the first character of every input string, the user might give examples of strings as input with lengths of 1, 2, or 5 characters and an output example of a single character. We can guess the most likely program to select as a solution might be "firstCharacter". Similarly, if the input consists of numbers, we can guess the most likely program to select as a solution might be "addition".

Large Database of Programs

In a true Programming by Example system, the data-set would consist of a massive number of stored programs and their corresponding features. However, for this proof-of-concept, we include a very limited set. This example also omits the creation of the feature set, which would involve logical reasoning over the input/output examples, prior to performing neural-guided heuristics for selecting a solution. That is, knowing to extract the first character versus the second character, or some other substring, requires much more feature transformation than is provided in this simple example.

inputType numInputs inputLength outputType numOutputs outputLength program
character 1 1 character 1 1 firstCharacter
character 1 2 character 1 1 firstCharacter
character 1 5 character 1 1 firstCharacter
character 2 1 character 1 2 concat
character 2 2 character 1 4 concat
character 2 5 character 1 10 concat
numeric 2 1 numeric 1 1 addition
numeric 3 1 numeric 1 1 addition
character 1 1 character 1 1 firstCharacter
numeric 2 1 numeric 1 1 addition
character 1 15 character 1 1 firstCharacter
character 4 3 character 1 12 concat
numeric 8 8 numeric 1 1 addition
# weights: 24 (14 variable)
initial value 8.788898
iter 10 value 0.054007
final value 0.000072
converged
[1] "All predictions correct!"
results addition concat firstCharacter
addition 2 0 0
concat 0 1 0
firstCharacter 0 0 2
> results
[1] firstCharacter addition firstCharacter concat addition
# Very basic Programming by Example implementation with a machine learning model based on input/output features.
library(nnet)
# Load a data-set of features based on input/output characteristics.
df <- read.csv('features.csv')
test <- df[9:nrow(df),]
# Multinomial logistic regression.
fit <- multinom(program ~ ., data = df[1:8,])
# Predict the solution program for each input/output set.
results <- predict(fit, newdata=test)
# Confirm results.
print(ifelse(all(results == test$program), 'All predictions correct!', 'Some predictions failed.'))
print(table(results, test$program))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment