NickRSearcy/chktex-bug.tex

## chktex-bug.tex
\documentclass[floatsintext,man]{apa6}

\usepackage{cleveref}
\usepackage{hyperref}
\usepackage[hyperref=true]{acro}
\usepackage{xcolor}
\usepackage{color}
\usepackage{soul}
\usepackage{amssymb,amsmath}
\usepackage{graphicx}
\usepackage[natbib=true]{biblatex}
\usepackage{threeparttable}

\addbibresource{/Users/nrs/Google Drive/Lit/cocosci.bib}

\definecolor{dm}{RGB}{50,0,50}

\hypersetup{
colorlinks=true,
citecolor=magenta,
linkcolor=dm
}


\DeclareAcronym{rl}{
  short=RL,
  long=representation language
}
\DeclareAcronym{bl}{
  short=BL,
  long=simple Boolean logic
}
\DeclareAcronym{fol}{
  short=FOL,
  long=first-order Logic
}

\frenchspacing
\title{Learning expressive concepts: a preliminary examination}
\shorttitle{Learning expressive concepts}
\author{Nick Searcy}
\affiliation{The University of Louisville}
\abstract{\hl{write an abstract}}


\begin{document}

\maketitle

\tableofcontents

\begin{quote}
``Locke, in the seventeenth century, postulated (and rejected) an impossible language in which each individual thing, each stone, each bird, each branch, would have its own name; Funes once projected an analogous language, but discarded it because it seemed too general to him, too ambiguous. In fact, Funes remembered not only every leaf of every tree of every wood, but also every one of them times he and perceived or imagined it.''\citep{borges1964}
\end{quote}

\citet{borges1964} described the fictional `Funes the memorious', who developed the ability to remember absolutely everything after an accident. This decidedly fictional ability does a good job of illustrating the role of concepts in people's daily lives.

\hl{explain more}

\section{Introduction}

The primary goal here is to answer “What is expressivity?” That’s the idea that is used to carve up the literature in the remaining sections. I plan to separate the term used to talk about “expressivity” from the overloaded term “complexity” and discuss complexity, as needed, in the following sections. (Fodor 1975; Feldman, 2000; Goodman et al., 2008, Kemp, 2012; Piantadosi et al., 2015)

Concepts are the units of thought. When agents retain representations of past experience, they do so in the form of concepts. The interaction of these representations is what supports reasoning and decision making; in the case of social creatures these representations can be shared; and learning is the process of building these representations.

The structure of these representations (or equivalently in some cases, the nature of the process that generates them) then impacts how concepts are used at many levels.
Researchers have been working to understand the representation of concepts for a very long time \citep{frege1879}.

\citet{newellss1957}'s work on the Logic Theory Machine, one of the earliest Artificial Intelligence programs.

Because they underly so much of human cognition, an understanding of concepts---such as information about the structure of concepts and about how they interact---should provide insight into how that cognition works and why.

And we have reason to believe that this approach is working.
The earliest investigations into the nature of concepts \citep{hull1920,brunerga1956,shepardhj1961} were primarily concerned with the \emph{effects} of the structure of concepts rather than the underlying structure itself.
They found that for different artificial concepts (i.e. those constructed in the lab) measures of learning differ.
Many researchers speculated that these measurements might provide clues as to the underlying structure of concepts and this process has been the focus of later studies of concept learning.
The approach of using a proposed representational system to predict learning outcomes has been a successful method for explaining the differences in learning outcomes caused by changes in the structure of artificial concepts \citep{feldman2000,goodwinj2011,goodmantfg2008}.

Knowledge of the underlying representation allows us to predict much more than the difficulty of learning a concept.
\citet{eavess2014} found that when learning a concept, the ordering of the examples affects how easily and accurately the concept is learned and that the underlying representation predicts ordering effects.
\hl{more examples?}


Proposals for this underlying structure include prototypes and exemplars, different rule-based systems, and connectionist systems; and each of these amounts to a class of proposals within which are many specific cases.
While many arguments are presented as advocating for one class of proposals against another (e.g. for rules instead of exemplars) many of the specific proposals include important details that can be lost when comparing the larger classes of proposals.
In other words, many of the details required to fully implement a representational learning model are hardly if at all informed by the class of representational system used.

For example, \citet{nosofskypm1994} argued that much of the useful information about what kinds of representations are used is lost when a model is fit to aggregated data instead of individual data and contrasting these approaches is what allowed a rule-based model to explain the prototype effects present in aggregate data.
However, the importance of this approach---aggregating fits to individual data rather than fitting to aggregated data---seems to have been lost even as models using rules have returned to prominence \citep[there are, of course, exceptions, including especially][]{goodmantfg2008}.
The landscape of proposals for concept structure is both large and intricate.
In addition to the more common questions of which class of proposals better accounts for human concept representation, many other details of concept representation are worth considering.
I focus for the rest of the review on one of these which I consider of particular importance: expressivity.

\begin{table}
\centering
\resizebox{\columnwidth}{!}{%
  \begin{tabular}{ | l | p{8cm} |}
    \hline
    Representation (general) & What is the basic structure of the representation? (rules, exemplars, prototypes, possibly even associations) \\
    Representation (specific) & Given each representation type, there are many different ways to instantiate the representation. What kind of features are used? What operations? \\
    \textbf{Expressivity} & How expressive is the representation? Can all computable functions be expressed? If not, what cannot be expressed? \\
    Complexity & given the representation, what determines the complexity (i.e. difficulty of processing and using) of the individual concepts in the representation? \\
    Composition & How are existing concepts used in the service of learning new concepts? \\
    Minimization & When learning/using a concept, does one find the absolute minimum version, an approximate minimum, or any at all? \\
    Determinism & Many of the questions above can be answered either deterministically (i.e. given the same input, the response should always be the same) or not (e.g. probabilistically) \\
    Aggregation & Related to the question of determinism is the idea that some important evidence of different representations might be lost when responses from many are aggregated. \\
    Mixture & Related to the minimization and composition questions is the question of whether individuals hold a single definitive version of each concept or some composite or mixture. \\
    \hline
  \end{tabular}
}
\caption{Key questions in literature}
\begin{tablenotes}[para,flushleft]
{\small
A (certainly incomplete) list of the significant questions to be addressed by the study of representation and concept learning. These questions are all at least partially orthogonal, meaning that answering one might provide some information for answering others but should not be expected to completely answer any others. Additionally, the relationship between these questions and their answers is largely unstudied. The focus of the review at hand is expressivity.
}
\end{tablenotes}
\end{table}

\subsection{Expressivity}

Expressivity refers to the ability of a \ac{rl} to produce a concept representation that matches some specified concept.
Even when the notion has not been mentioned by name, expressivity has been an important part of concept learning to date.
Across related fields, representational work has generally begun with less expressive representations for practical reasons and there has been a push to build models that are progressively more expressive.

In general the representations used are such that the less expressive ones are proper subsets of more expressive ones, though this does not need to be the case.
Expressivity can thus be thought of in terms of different levels of expression, with Boolean logic (or perhaps, even a sub-Boolean logic) sitting at bottom and progressively more expressive representations above: first-order logic, second-order logic, and Turing-complete representations like lambda calculus.
Results regarding expressivity have been important for concept learning and related fields: \citet{minskyp1969} showed that a single-layer neural network is \emph{incapable} of learning the concept `either A is true or B is true but not both' (exclusive-or, XOR) leading to a decades-long reevaluation of neural networks; \hl{find a FOL concept}; \citet{carey2009} has argued that children learn to use number by bootstrapping the the recursive relationship between small numbers, which can be subitized, and the list of numbers.
These are examples of concepts that can be represented by \acp{rl} with different expressive capacities: exclusive-or can be expressed by predicate logic, \hl{FOL concept} can be expressed by first-order logic but not predicate logic, natural number can be expressed by second-order logic but not first-order logic, \hl{turing complete concept} can be expressed in any turing complete language but not second order logic.

In what follows I begin answering the following questions: How expressive are current models of concept learning?
How expressive \emph{should} models of concept learning bed?
How expressive is the human language of thought?
I do this first by surveying both modeling and empirical work and determine the level of expression in models and experiments.
I then use this to make three arguments: The human language of thought is almost certainly Turing-complete (or at least indistinguishable from this) and, thus, for our purposes maximally expressive.
That, in general, models of representation are better off using the least expressive representation that includes the concepts under study.
However, special attention should be paid to models and assumptions which cannot hold for more expressive representations.

A Boolean logic representation language is any language that has Boolean features and connectives and not all Boolean \acp{rl} are logically complete---i.e. include the full set of concepts possible under Boolean logic.
Adding certain connectives ensures that a Boolean \ac{rl} is logically complete.
Expanding binary features to n-ary features then allows any Boolean \ac{rl} to represent n-ary concepts.
Such the expressivity of such \acp{rl} can then be expanded again by allowing quantification over objects to get a first-order logic, which may express an even larger class of concepts.
The expressivity of second-order logic allows quantifiers over quantified expressions (often called `schemas' in logic).
The end point for our purposes is a \ac{rl} that can express any computable function, such as lambda calculus or higher-order logic.
While considering these classes of \acp{rl}, the main point I wish to make is that, while there is not technical reason to suppose that the expressivity of several \acp{rl} be nested various practical realities have meant that those that I deal with here will be.

How expressive, then, is the representation humans use to learn concepts?
One possibility is that it is Turing-Complete, meaning that it is able to express any computable function and bears an equivalence with all other Turing-complete representations for the class of computable functions.

There are several reasons it would be beneficial for any agent that learns about the world by storing representations to do so with a Turing-complete language


\subsection{Preliminaries}

While many results and even more arguments in the concept learning literature are about expressivity, this quality is often addressed implicitly. Here, I will briefly cover the preliminary notation that will allow us to cover these details more explicitly.

First, I will use the term representation, alone, to describe a system that includes some defined process to make the decisions necessary for the use of concepts, such as acquisition, categorization, generalization, etc. This term is useful for talking about representations including those that \emph{are not} representation languages. A representation language on the other hand is a representation that can be defined in some formal language. A common means of defining a formal language is using a context-free grammar, as I will demonstrate below, however many models of concept learning that are in fact representation languages (e.g. prototype models) have generally managed to avoid being presented as a formal language. Because formal languages provide a useful tool for discussing the expressivity of a representation, my approach will be to present such models as a formal language.

A context-free grammar is a system for generating formulas, or `sentences' that compose a language. A CFG consists of `variables' which are symbols that will not appear in the sentences and one of which is the start symbol, `terminals' which are the terminal symbols that do appear in the sentences, and the production rules that link variables to terminals. Variables are often capitalized and terminals are often lowercase to distinguish between them. An example of a small CFG is,
\begin{align} \label{eqn:classical}
  S_0 &\rightarrow S_0 \wedge F \mid F  \\
  F &\rightarrow f_0 \mid f_1 \mid \dots \mid f_n ~. \nonumber
\end{align}
This grammar corresponds to a common interpretation of the `classical model' of concepts: that a concept is represented by the properties (or features) necessary for its application. Logically, this can use the Boolean AND operation ($\wedge$) which returns True if both inputs are True and False otherwise. If both $f_1$ and $f_2$ are necessary for the concept $c$, we can represent the concept with $c = f_1 \wedge f_2$. To see how the grammar produces this, start with $S_0$, then use the first production $S_0 \rightarrow S_0 \wedge F$ (the vertical lines separate the possible productions for a given variable). The second $S_0$ can then be removed using the second production $S_0 \rightarrow F$ to give us $F \wedge F$. Finally, we can pick productions from the (abbreviated) bottom row to give $f_1 \wedge f_2$. The initial ``$c=$'' is omitted because that is the same for every representation language used here.

The representation languages discussed here include a few symbols and operations which may not be familiar to the reader. The Boolean labels are \{True, False\} and Boolean logic is a representation language where all the variables and operations may only output one of the Boolean labels. First order logic is like Boolean Logic with a few additions. Quantifiers are operations that deal with number in a set of objects. The existential operator---``$\exists$'' in a formula like $\exists o \in \mathcal{O} ~ p(o)$, read ``there exists an $o$ in $\mathcal{O}$ such that $p(o)$''---applies an expression to every item in a set and returns True if the expression is true for one or more item. The universal quantifier operates similarly by applying an expression to every item in a set but instead requires the expression to be True for *all* items.


\begin{table}
\centering
\resizebox{\columnwidth}{!}{%
  \begin{tabular}{ | p{4cm} | p{4cm} | p{4cm} |}
    \hline
    Level & Description & Examples of concepts \\ \hline
    Boolean Logic and predicate calculus & Each feature and operator use only the Boolean labels & AND, OR, XOR, NAND \\
    First-order logic & As Boolean logic with the addition of quantifiers over elements & quantities and comparisons\\
    Second-order logic & As First-order logic with the addition of quantifiers over relations, functions, and sets & Natural number line and real number line  \\
    Higher-order logics, \hl{lambda} -calculus, and others & Universal models of computation that can describe any computable function & Arbitrary computer programs and concepts \\
    \hline
  \end{tabular}
}
\caption{Levels of expression}
\begin{tablenotes}[para,flushleft]
{\small
This table lists levels of expression as they correspond to different formal systems
}
\end{tablenotes}
\end{table}

% \subsection{Which representation(s)?}

% A foundational question in the study of concept learning is about what representations are used to learn concepts.
% There are in fact (infinitely) many candidate representations, and researchers have found it useful to group the proposals into a few broad classes: exemplar models store observed examples of a class in a multi-dimensional space and categorize novel (and non-novel) examples according to their position in this space relative to stored examples; prototype models summarize observed examples into one or a few prototypes and categorize examples based on the position relative to these prototypes; rule models learn (or select) a rule from a representation language according to how well


% The primary goal here is to answer ``What is expressivity?'' That’s the idea that is used to carve up the literature in the remaining sections. I plan to separate the term used to talk about ``expressivity'' from the overloaded term ``complexity'' and discuss complexity, as needed, in the following sections. (Fodor 1975;

% Feldman, 2000; Goodman et al., 2008, Kemp, 2012; Piantadosi et al., 2015)

\section{Modeling}

% The models discussed here can be divided into three categories: those with explicit rule-based models already in the form of a grammar, those with rule-based or rule-like models that are easily translated into grammars, and those for which translation into rule-based grammars is not presently practical. My approach in this section is to consider a number of concept learning models, informally discuss which of the three above categories they belong to, and discuss which aspects of each model have an impact on expressivity and why.

% The use of grammars or similar tools to discuss representations is valuable when discussing expressivity. A grammar concisely describes the set of possible sentences (in our case concepts) that are in a language (in our case a model). Thus, to show that a concept model can express a concept in a certain class, one may use the corresponding grammar to show that such a concept is in the set of possible concepts for the model. It is also possible to show that a grammar \emph{cannot} produce a sentence of a certain type, though I will do so informally, without rigorous mathematical proofs.

When surveying the modeling literature I will begin in a similar way to \citet{smithm1981} by discussing the classical model of concepts, followed by probabilistic/prototype models, and exemplar models. This provides a convenient narrative for introducing both connectionist and rule-based models that have found success since \citeauthor{smithm1981} wrote their guide. In addition to this difference, I will proceed through the discussion of these models with an eye toward expressivity.

There is a trajectory of classical -> prototype/probabilistic -> exemplar -> connectionist -> rule-based.

\subsection{Classical model}

Several reviews of concept learning models begin with the ``classical model'' and for good reason. This is perhaps the simplest possible model of concept learning and this simplicity provides an instructive contrast to later models. The classical model proposes that each concept is made up of a list of necessary and jointly sufficient features. For example, the concept ``bachelor'' might be defined using a classical model consisting of the features ``unmarried'' and ``man''. Any example that satisfied both of those features would then be categorized as a ``bachelor'' and any example that fails either or both would not be. Using a logical formula, this looks like $c_{bachelor} = f_{unmarried} \wedge f_{man}$. The representational system inherent in the classical model is described in \cref{eqn:classical} but it is easily described summarized as a series of conjunctions.

While the classical model might have proved useful for philosophical analysis, it doesn't seem that many (or possibly even any) psychologist actually used the classical model as an explanation for concept learning or any related phenomenon. The particular deficiencies of the classical model can provide insight into the motivation behind the bells and whistles of later models, though. Many of these deficiencies have been discussed previously: it does not seem that any real concepts actually have a definition in the classical sense, there are many observed effects of concept learning and use that are not predicted by the classical model (e.g. typicality and other gradation effects CITES, order effects CITES, \hl{others?}).

TODO make an illustration of this.

Among other problems, the classical model also has an extreme expressivity problem that has been rarely discussed. The classical model only allows for one means of combining features in to represent a concept: conjunction. An illustration should suffice to explain the problems this creates. Imagine a list of features: $\{f_{color},f_{shape}\}$. We might assume that a classical model can combine such features with their possible values to create Boolean features, such as $(f_{color}=v_{blue})$. Consider, then, the number of concepts in this simple world. If we restrict the universe to objects with these two features, and restrict each feature to three possible values, then there are $3^2 = 9$ different examples and $2^9=512$ different concepts in this universe. How many of those concepts can be represented using a classical model? Just $4^2=16$ because there are two features each of which can take on three values or be omitted. This comes to coverage of about $3\%$ of the possible concepts. The classical model has a concept for ``red objects'', ``blue and square objects'', and ``circle objects'' but no concept for ``objects that are not blue'' nor ``either red or not square but not both''.


And the proportion of the total concepts that can be represented quickly decreases as either the number of features or the number of values for each feature increases. To put it plainly, in any realistic context with a large number of complex features, the share of possible concepts that can be represented by the classical model diminishes to nothing.

It's worth briefly considering possible remedies for the classical model's lack of expressivity (both of these remedies will be discussed in more detail later). One possibility is including the negation of features. So a model might add a third rule to the classical model's grammar,
\begin{align} \label{eqn:classical_neg}
  S_0 &\rightarrow S_0 \wedge N \mid N \\
  N &\rightarrow \overline{F} \mid F \nonumber \\
  F &\rightarrow f_0 \mid f_1 \mid \dots \mid f_n ~. \nonumber
\end{align}
This allows the model to represent concepts like ``objects that are not blue''. With negation, the model can represent $9\%$ of the concepts in the universe of two features with three values described above. This isn't a very large improvement, and the proportion still declines to zero in the limit, but it sets up a promising possibility.

Another improvement is the inclusion of compositionality. Composition roughly corresponds to building novel concepts out of existing ones. It is an extraordinarily important part of concept representation CITE but it is omitted from many models for a variety of reasons, chief among them that allowing composition makes the space of possible concepts infinite and so it becomes impossible to enumerate. The following grammar adds a list of $m$ concepts to the set of things that can take the place of features,
\begin{align} \label{eqn:classical_neg_comp}
  S_0 &\rightarrow S_0 \wedge N \mid N \\
  N &\rightarrow \overline{F} \mid F \nonumber \\
  F &\rightarrow C \mid f_0 \mid f_1 \mid \dots \mid f_n \nonumber \\
  C &\rightarrow c_0 \mid c_1 \mid \dots \mid c_m ~. \nonumber
\end{align}

If compositionality is added to the classical model alone (without negation) it provides no effect, as composing a series of conjunctions with another series of conjunctions results in nothing more than another series of conjunctions. However, when combined with negation we arrive at a representation that has full coverage of the Boolean concepts. To see why, consider the concept $c_1$ ``objects that are not blue and not square''. The classical model can learn this by inserting the negated features into a single conjunction. But the classical model cannot represent $c_2$ ``objects that are not (blue and square)''. The difference is subtle but, because not operates outside of the conjunction in $c_2$, any object that is ``not blue'' is in the concept regardless of the shape. The $c_2$ concept has the form of the NAND operation, an extremely important operation in digital logic design because of its ability to represent all logical functions and convenient physical implementation in transistors \citep{roth2013}.

This point will be made in greater detail later but the problems with the classical model and the two part solution proposed illustrate a few things about the expressivity of models of representation.
Many models of representation have serious problems with expressivity and only a few make an explicit attempt to address expressivity.
The improvements that would enable expressive representation are often somewhat, but not entirely, orthogonal to other purposes of the representational model.
For example, the conjuctive nature of the classical model is surely a characteristic aspect of it because one of the main arguments against the model is that almost no concepts have classical definitions---

\subsection{Non-logical models}

Models that do not explicitly use logical rules to represent concepts  have been popular in concept learning research and related areas. While rules are not used \emph{explicitly} each such model does necessarily include structure that, at least in theory, could be accurately and completely summarized as a language of rules using a context-free grammar or something like it.

My treatment here of non-rules models will vary by case. In certain cases, it seems that theories proposed as alternatives to rule-based models are not meant as models of representation at all, but rather theories that constrain such models. For instance, the prototype theory of Rosch (CITE) is often presented as a \emph{representational} theory that is an alternative to logical representations (CITE). However, Rosch has described the many findings of prototype effects as really referring to ``judgements of degree of prototypicality'' and clearly stated that ``[p]rototypes do not constitute a theory of representation''.

However, \citet{smithm1981} have developed a representational model meant to capture the representational notions that seem implicit in much of the work of Rosch and collaborates. This model is based off the `context model' of \citet{medins1978}. I'll first explain the machinery of the context model and then show how this model differs from the supposed representational interpretation of prototype theory.

The context model includes a component for categorizing examples as a member of a target concept $C$ or not (the negative is treated as membership in a contrasting category $\overline{C}$ or set of categories). Given an target example $x$ that is to be categorized, the model draws objects $o_i$ from the set of all objects according to how similar each object is to the target. The example is categorized using the first category, either the target category or a contrasting one, to draw a certain number of examples. For the context model, the similarity between two objects $s_c(o_i,o_j)$ is calculated by multiplying a similarity value for each feature (all objects are assumed to contain the same features).

The probability of drawing each example from memory given the target example $p(o|x)$ is

\begin{equation}
  s_c(o_1,o_2) = \prod_{f \in F} s_f(f,o_1,o_2)
\end{equation}
\begin{equation}
  s_f(f,o_1,o_2) =
  \begin{cases}
  1   & \quad o_1[f] = o_2[f] \\
  c_f & \quad else \\
  \end{cases}
\end{equation}
\begin{equation}
  p(o|x) = \frac{s_{context}(o|x)}{\sum_{o_i \in U} s_{context(o_i|x)}}
\end{equation}
where


\citet{posner1986} developed a prototype model. \citet{medins1988} discuss some of the problems with this.

\citet{reed1972} introduced the proximity model.

\citet{medins1978} developed the context model.

Briefly discuss non rule representations and justify a focus on rule-based representations (Rosch and Mervis, 1975, Medin and Schaffer, 1978, Nosofsky, 1986, Kruschke, 1992, Love et al., 2004, McClelland, Rumelhart, 1986)

\subsection{Models with limited expressivity}

Discuss models with limited expressivity i.e. Boolean concepts (Neisser and Weene 1962; Haygood 1963; Medin, Wattenmaker, and Michalski 1987; Nosofsky et al., 1994; Feldman 2006; Goodwin 2006; Vigo 2009; Lafond, Lacouture, and Cohen, 2009; Goodwin and JohnsonLaird 2011; Gentner, 1983; Holyoak and Thagard, 1989)

One class of models utilize the \emph{representation length approach}.
In this approach, a model consists of i) a language ii) a method for computing the length of formulas in the language and iii) a method for estimating the length of the shortest consistent formula given a concept.
There are a variety of different ways to provide these components.
The language is often generated by a formal grammar.
The length function typically counts the number of operations in each formula and assigns a (not necessarily uniform) cost for each of them.
The minimization process is often exhaustive for most limited representation languages, i.e. one can generate a set of all consistent formulas that is guaranteed to include the minimal formula and then search this set.
But this approach is generally intractable so estimation techniques are used in many cases.

The Rule-Plus-Exception (RULEX) model of \citet{nosofskypm1994} was one of the earliest concept learning models that explicitly built complex concepts (i.e. concepts that include multiple operations) out of rules.
In RULEX, concepts are built from a disjunction of simple rules and one or more exceptions.
This is implemented as a decision tree, though it is precisely equivalent to the following grammar
%
\begin{align*}
  S_0 &\rightarrow \left( (R) \wedge (E_n) \right) \vee (E_p) \\
  R &\rightarrow F \mid R \wedge F \\
  E_n &\rightarrow R \mid E_n \wedge R \mid \epsilon \\
  E_p &\rightarrow R \mid E_p \wedge R \mid \epsilon ~.
\end{align*}

Throughout this paper, I translate the representation language of the rule-based systems into a formal grammar.
This demonstrates conclusively that, so long as my translation is correct, the representation system is a logical rule-based one and facilitates a comparison in terms of expressivity.


The RULEX model does not include an explicit measure of complexity.
Instead the addition of each rule is determined probabilistically such that more complex concepts (i.e. those that involve the addition of many rules) are simply less likely to occur.
This process is effectively similar to the probabilistic grammar approach of \citet{goodmantfg2008} which I discuss later.

\citet{feldman2000} designed a model that explains the difficult of learning a large number of concepts based on two effects: Boolean complexity and parity.
For each concept, Boolean complexity is derived from the representation length approach for the language of simple Boolean formulas,
%
\begin{align*}
  S_0 &\rightarrow S_0 \wedge S_0 \\
  S_0 &\rightarrow S_0 \vee S_0 \\
  S_0 &\rightarrow \overline{S_0} \\
  S_0 &\rightarrow ( S_0 ) ~.
\end{align*}
%
The minimal formula was found using a heuristic method.
Parity is determined by the fraction of examples that are positive or ``in the concept''.
So concepts with \emph{up} parity have more positive examples than negative, like $f \equiv a \vee b$, and concepts with \emph{down} parity have more negative examples than positive, like $f \equiv a \wedge b$.


\citet{goodwinj2011} designed a model based on mental models \citep{johnsonlaird1983}, an important more general theory of concepts.
Goodwin and Johnson-Laird use a representation length approach with following representation language,
%
\begin{align*}
  S_0 &\rightarrow S_0 \otimes S_0 \mid S_1\\
  S_1 &\rightarrow S_1 \wedge S_1 \\
  S_1 &\rightarrow \overline{S_1} ~,
\end{align*}
%
where $f_0 \otimes f_1$ is the exclusive-or that is true when either $f_0$ or $f_1$ are true but not both.
The exclusive-or operations separates the ``mental models'', so that the concept $f \equiv (f_0 \wedge f_1) \otimes \overline{f_1}$ would contain two models, $f_0 \wedge f_1$ and $\overline{f_1}$.
The number of models in the minimal formula for each concept (and equivalently the number of $\otimes$ operations plus 1) is used as the length function.

The model proposed that the length of this formula would predict the difficulty of learning the concept (measured by the accuracy of recalling the concept after training). This alone did not provide something.


\begin{table}
\centering
\resizebox{\columnwidth}{!}{%
  \begin{tabular}{ | p{2.5cm} | p{3.5cm} | p{5cm} | p{2.5cm} | p{2.5cm} |}
    \hline
    Study & Grammar & Complexity  & Minimization & Description
    \\ \hline
      \citet{nosofskypm1994} &
      {\begin{align*}
        S_0 &\rightarrow S_0 \wedge S_0 \\
        S_0 &\rightarrow S_0 \vee S_0 \\
        S_0 &\rightarrow F
      \end{align*}} &
      The number of features in a formula. &
      Heuristic (with some known deviations from ideal \hl{CITE}) &
    \\ \hline
      \citet{feldman2000} &
      {\begin{align*}
        S_0 &\rightarrow S_0 \wedge S_0 \\
        S_0 &\rightarrow S_0 \vee S_0 \\
        S_0 &\rightarrow F
      \end{align*}} &
      The number of features in a formula. &
      Heuristic (with some known deviations from ideal \hl{CITE}) &
    \\ \hline
      \citet{goodmantfg2008} &
      {\begin{align*}
        S_0 &\rightarrow S_0 \vee S_0 \mid (S_1)\\
        S_1 &\rightarrow S_1 \wedge S_1 \\
        S_1 &\rightarrow F
      \end{align*}} &
      \begin{equation*} \left( \prod_{f \in F} \beta (C_f(s) + 1) \right) e^{ b Q_l (s,E)} \end{equation*} &
      Monte Carlo simulation &
    \\ \hline
      \citet{goodwinj2011} &
      {\begin{align*}
        S_0 &\rightarrow \otimes(S_1)\\
        S_1 &\rightarrow S_1, S_1 | S_2 \\
        S_2 &\rightarrow S_2 \wedge S_2 | F
      \end{align*}} &
      The number of models used (separated by $\otimes$) &
      Brute force &
      Based on Mental Models \\
    \hline
  \end{tabular}
}
\caption{Representation-length models for less expressive concepts. All grammars include the rule $F \rightarrow f_0 \mid \overline{f_0} \mid f_1 \mid \overline{f_1} \mid \dots$ that allows $F$ to produce any literal as a nonterminal.}
\end{table}


% \begin{table}
% \centering
% \resizebox{\columnwidth}{!}{%
%   \begin{tabular}{ | p{2.5cm} | p{3.5cm} | p{5cm} | p{2.5cm} | p{2.5cm} |}
%     \hline
%     Study & Grammar & Complexity  & Minimization & Description \\
%     \hline
%       \citet{kemp2012} &
%       \begin{aligned}[t]
%         S_0 &\rightarrow S_0 \vee S_0 \mid (S_1) \mid S_3 (S_0)\\
%         S_1 &\rightarrow S_1 \wedge S_1 \mid S_2 \mid S_3 (S_1)\\
%         S_2 &\rightarrow \forall x \mid \exists x \\
%         S_3 &\rightarrow F \mid \overline{F}  ~.
%       \end{aligned}&
%       $(1 - p)((1-q)l^+ + qb) + p((1-q)l^- + qb^-)$ where $p$ is the probability of encoding the negative extension, $q$ is the probability of encoding examples without minimizing formula, $l^+$ ($l^-$) is the number of features in the minimal description of the positive (negative) extension with relational features, e.g. $R(x,y)$, counting double, and $b^+$ ($b^-$) is the number of examples times the number of features for the positive (negative) extension. &
%       Exact minimization was performed for  &
%       \\
%       \hline
%   \end{tabular}
% }
% \caption{Representation-length models for more expressive concepts.}
% \end{table}


\subsection{Models with greater expressivity}

The Rational Rules model of \citet{goodmantfg2008} uses DNF grammar and applies this grammar with a creative measure of complexity and minimization strategy.
The grammar looks like

\begin{align*}
  S_0 &\rightarrow S_0 \vee S_0 \mid (S_1)\\
  S_1 &\rightarrow S_1 \wedge S_1 \\
  S_1 &\rightarrow F \mid \overline{F} ~.
\end{align*}

The complexity is calculated as the probability of the formula $s$ given the set of labelled examples $E$:
% \begin{equation}
%  P(s|E) \propto \left( \prod_{f \in F} \Beta (C_f(s) + 1) \right)
%   e^{ -bQ_l(s,E) }
% \end{equation}


For present purposes, I will focus on two qualitative features of this posterior probability.
The product, which serves the role of a prior, determines the probability of each formula in the absence of any data.
This prior has the property of incentivizing the reuse of compositional elements so that a formula that uses two distinct features with one used twice is more probable than one that uses three distinct features one time each, $P(f_0 \wedge f_1 \vee \overline{f_0}) > P(f_0 \wedge f_1 \vee \overline{f_2})$.
The exponential serves the function of a likelihood, determining how the observed examples bear on the probability of the formula.
The function $Q_l$ simply counts the number of examples misclassified by $s$, so that a formula becomes exponentially less likely the more examples it misclassifies.

Models with greater expressivity (Kemp 2012; Goodman et al., 2008; Piantadosi et al., 2015; Falkenhainer et al. 1989; Hummel and Holyoak 2003; Mathy 2010; Gentner, and Kurtz 2005; Kemp and Tenenbaum 2008; Anderson 1991; Kemp et al. 2010)

\section{Empirical work}

Experiments using strictly less expressive concepts (Hovland, 1952; Shepard et al. 1961; Bourne 1970;

Haygood and Bourne, 1965; Feldman 2000; Goodman et al., 2008)

Experiments including more expressive concepts (Skorstad et al. 1988; Kotovsky and Gentner 1996; Kemp

2012; Piantadosi et al., 2015; Gentner, and Kurtz 2005; Eaves and Shafto 2014; Kemp, Goodman, and

Tenenbaum 2008; Kemp 2009; Kemp, Goodman and Tenenbaum 2008; Kemp et al. 2010)

\section{Prospects and problems}

What are the gaps?

Do any models address the problem of tractability of complex concepts?

Is a single model or representation language consistent with the evidence?

If not, which are the best candidates and what makes them inadequate?


\newpage

\printacronyms

\printbibliography


\end{document}