Skip to content

Instantly share code, notes, and snippets.

@SharpCoder
Last active August 29, 2015 14:23
Show Gist options
  • Save SharpCoder/4548adaa267d489eac4f to your computer and use it in GitHub Desktop.
Save SharpCoder/4548adaa267d489eac4f to your computer and use it in GitHub Desktop.
Sentence Generation as a Function of Classification

Sentence Generation as a Function of Classification

This is a collection of thoughts I have regarding a potential engine for generating content. Bot-generated stories is a fascination of mine, and I am developing a potential implementation for working in this problem space. The idea involves usage of a neural network to classify training data. Once classified, sentences can be generated as a subset of intent which will drive the output.

Intent

This is a core concept for the whole system. Intent will be a pre-defined action item. These are essentially the classification outputs. An example of intent may include the following:

  • Describe a thing
  • Move about the world
  • Take an action
  • Use an item
  • Engage in dialog

These intents are all hard-coded, but how they are accomplished becomes a problem of classification. Example:

The {dog} has {four legs}. => {constant} {subject} {constant} {property}.

By breaking down a sentence into tokens, we can define rules for the action of description. This is useful for replacing tokens and creating new sentences, however, it lacks diversity.

Objects

The bot can interact with various objects. These objects will be represented as a collection of properties and can be parsed through input text. The dog has 4 legs. The squid has 8 tentatcles. The house has a red door. These are all properties of objects, and the bot will maintain a list of said properties.

Learning

There are a few goals with learning. First, the overall sentence should be classified into one of the action categories. In that way, the bot will learn new sentences that can be used to diversify a specific action. Next, the bot will need to be able to tokenize the words in a sentence. This may or may not use a neural network. The classification and tokenization processes should be learned through developer-assisted training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment