EdwinWenink/gist:8e2524ea44d46bc48771ab44664ad5f7

## gistfile1.txt
# Similarity as Transformation | Hahn, Chater, Richardson. 2003

## Abstract

- Similarity is determined by the *transformation distance* between representations
	- Similar entities have representations that are easily transformed into each other
	- Dissimilar entities require many transformations
- Present three experiments on the influence of transformation distance on similarity
	- Versus: featural or spatial accounts of similarity
- Introduce transformation-based accounts of similarity

## Intro

The notion of similarity is used in a vast range of explanations.
However, it is sometimes criticized to be too vague.

The success of theories based on explicit measures of similarity that account for empirical phenomena is a counter to this criticism.
E.g. categorization.

**Classical approaches** to similarity:

1) Spatial account (Shepard): similarity in terms of distance in a psychological space
2) Contrast model (Tversky): similarity as a function of common and distinctive features of the entities under comparison

 **Limitation:**

"They are restricted in scope by the fact that they define similarity over very specific - and very simple - kinds of representation: points in space or feature sets."

But most theories about how we represent natural objects assume that instead we require *structured representations*, i.e. that involve objects and relations, the composition of an object in subparts and their respective relations, etc.

This type of representation does not fit the two classical approaches.

**Two modern approaches**

1) Structural alignment approach to similarity, based on research on analogy
2) This paper: *Representational Distortion* (R.D.): the similarity between two entities is a function of the complexity required to distort/transform one representation into the other.

Remark on analogy from page 28:

"An important aspect of the problem of analogical mapping is the problem of finding a connection between two complex domains which exposes the common structure between domains, and hence allows knowledge about one domain to be transformed into knowledge about the other"

R.P. builds on *Kolmogorov* complexity: "the complexity of a representation, *x*, is the length of the shortest computer program that can generate that representation." Named K(x).

"Kolmogorov complexity has a natural application as a measure of similarity between representations, A and B.
The simplest measure is the length of the shortest program which takes A as input, and produces B as output – that is, the length of the shortest program that "distorts" one representation into the other.
This quantity is called the conditional Kolmogorov complexity, and is written K(B | A)."

Border cases:

1) Same representation: no transformation needed
2) Completely different: delete old one, build new one from scratch: there is no shared information that can be exploited.

The advantage of Kolmogorov complexity for a measure of similarity is its generality: for tree structures you have "transformational operations on trees", "sentences can be transformed by linguistic operations", etc.

So it completely surpasses the limitation of the two classical approaches.
Importantly, RD does not simply contrast these accounts, but *subsumes* them.

Technical issue: assymetric transformations, see footnote 2., p.4.

"In Marr's terms (Marr, 1982) RD theory seeks to characterize the computational level problem involved in determining similarity."

It is thus not a specific psychological theory.

Cognitive science lacks a well-specified theory of mental representation, needed for a specific psychological specification of this framework. However, this holds for the two classical approaches as well.

RD-analysis tries to offer a framework...:

- in the spirit of rational analysis
- as a derivation of Shepard's Universal Law of Generalization
- providing an explanation for the utility of similarity in inference

**Q**: whats this law?

**A**: Some info (without reading the original paper) can be found on page 26: "related inter-item confusability to the 'distance' between the representation of those items (Chater & Vitànyi, 2002)."

Goal of this article: do some experiments for "assessing the empirical usefulness of the transformational approach".

## Overview of previous work (p 5-9)

In a strict sense, there is little research that brings together transformation and similarity.

- Discussion of word similarity in the psychology of language.
- Perception as uncovering sequences of transformations
	- Also involving specific transformations, e.g. mental rotations.
- Analogical reasoning

Analogical reasoning:

"Most recently, computational models of analogical inference frequently postulate that analogies are made between pairs of items by representing these items so that a relatively simple set of transformations can map one item onto the other."

"requires adopting the cointroversial stance that similarity and analogy are sufficiently close as to be modelled in a single framework"

The Kolmogorov complexity could be an indication of the strenght of an analogical reasoning, and thus clarify *when* and *why* analogical reasoning is justified.

Discussion of papers that make a step in the direction of discussing transformational distance in relation to similarity, showing experimental data but not directly having the goal of providing a general theory of similarity.

## Experimental investigation

Two criteria for the usefulness of transformation as a measure of similarity:

1) Does it allow better coverage than featural and spatial views? (does it apply to wider stimuli)
2) The notion of transformation must be the right level of abstraction to seek systematic relations in similarities between objects

What is thus required? : "experimental data that directly link manipulations of transformation distance to perceived similarity"

These criteria concern the viability of the transformation-approach in general, not of specific transformational accounts.

Three experiments, with similar design and procedure, but very different stimuli and transformations.

## Experiment 1

**Stimuli**:

Sequences of filled or unfilled circles, that are transformations of each other through: mirroring, reversal, phase shift, insertion, deletion.

**Goal**:

Transform into target sequence

### Comparison with classical accounts

Transformation distance: how many transformations needed?

The featural and spacial account of similarity are more restrictive and can be subsumed under the transformational account. E.g. from "a transformational perspective, such featural models simply construe similarity between objects as the result of an extremely limited set of transformations, namely feature insertion, feature deletion, or feature substitution". Likewise for spatial models.

So evidence *against* featural models requires showing that this restricting is poorer in explaining performance.

Evaluation in exp. 1 is thus done by comparing predictive accuracy.

"test whether the number of transformations between the two patterns is a better predictor of perceived similarity than is the number of mismatching individual component blobs."

**Q**: measuring transformational distance in terms of the number of transformations seems to assume at least that all transformations are equally hard. But combined operations seem to be exponentially more difficult to me. Since the research aims to understand *perceived* similarity, is this something they can and/or should take into account?

**A**: the authors address this issue at page 27:

> Crucially, different transformations may have different degrees of
complexity – i.e. it is possible that transformations should be ‘weighted’ in some way,
rather than implicitly being treated as equal. Should this be correct, then the use of number
of transformations will decrease predictive accuracy relative to that which would be
possible if the relevant complexities were known.

Also relevant, p. 27:

> Finally, number of transformations will be a good predictor wherever one
is averaging across many specific transformations and their combinations, because on
average, two transformations will still be more complex than one, even if individual
transformations differ in complexity.


"each individual stimulus item was actually created from the target
using a specific quantity of operations so that this study is based upon a priori assertions as
to the number of transformations required. Thus, we are committed to the number of
transformations for each image in advance of any data collection and any ambiguities
in the number of transformations will diminish our predictive accuracy."


### Results

The number of transformations is a good predictor of the similarity rating of the participants.

"The general relationship between number of transformations and mean similarity ratings is graphed in Fig. 2 which suggests, somewhat surprisingly (see, for example, Shepard, 1987), an approximately linear relationship."

**Q**: why surprisingly?

**Q**: But does this experiment run the risk that it finds what it puts in? The presented sequences *are* obtained by applying transformations.

Rephrased:

Hahn et al. perform a comparison of a transformation-based and featural/spatial-based model of similarity by doing three experiments. But while reading, I was wondering to what degree the way they setup their experiments biases their results: if you *create* experimental material by explicitly applying transformations, *and* if these transformations remain quite visible, are you then not biased towards finding that indeed transformations are a better indication of perceived similarity, compared to a (quite stripped down) feature-based model?

Firstly, I'm curious whether other people had a similar intuition.
Secondly, do you think that this intuition forms a problem for their research, or that this intuition is missing the point?


**A**: the authors depart from the working assumption that it is plausible that transformations on stimuli are how psychologically similarity is perceived. But they also state researching this is not yet the point of their paper.

The featural model assumed here is very simple, and a counter argument could be that the feature model is underrepresented or perhaps even a strawman.

## Experiment 2

"many of the "features" such as the orientation of an item in a pair where one has been rotated are salient only *because* of the relevant transformations."

"For our stimuli this means that central object "features" will be derivative on the transformations present: for example, orientation is unlikely to have cognitive salience in a comparison *until* orientation is manipulated through rotations."

## Experiment 3

> The materials of Experiments 1 and 2 were amenable to basic featural or spatial representations
and comparison processes, even if these are, as argued, empirically and conceptually
less satisfactory than the explanation provided by transformation distance.
The stimulus materials of Experiment 3 sought to take the argument against featural and
spatial representations one step further, by using materials for which such representation
schemes appear especially inappropriate: namely, materials where similarity is determined
primarily by relational information.

Main point:

> From a transformational perspective, the Lego brick objects are of interest for two reasons. First, they allow an initial examination of the role of transformations in the similarity assessment of real-world objects, albeit maximally simple ones.
Second, these materials support a whole new range of transformations to complement those investigated in Experiments 1 and 2. Our assumption, here, was that the judged similarity between pairs of objects would be determined primarily by the physical manipulations required to turn a target object into the comparison object.

## General discussion

The two criteria have been met:

1) similarity is shown to be influenced by transformations other than the restricted set of featural / spatial (strongly correlated with mean similarity ratings)
2) this is "shown to be influenced by a wide range of transformations which share nothing apart from the fact that they are all transformations", indicating that transformation is the right level of abstraction.

"a crucial factor in evaluating competing accounts must be not only whether an account can be made compatible with a particular pattern of data but whether it in any way *predicted* it"

The authors state that there "is nothing whatsoever in featural theories of similarity that would naturally have given rise to the predictions made on the basis of transformations in this experiment."

And this feature account must therefore be independently motivated, not post-hoc fitted to data.
But since transformations are defined on relations between objects, and features on objects, transformations cannot be reduced to features, and a featural theory should compete with a transformational account on a completely different level.

Exp. 3 really focuses on the main representational weakness of featural accounts that they do not deal well with structured representations because they do not properly represent relational info.

Page 25-26, core message:

> From a transformational perspective, featural and spatial accounts of similarity are not wrong; they are simply too restricted.
Changes to a feature, feature insertions and deletions as well as changes along a continuous valued dimension are all bona fide transformations; consequently, it is no surprise that both featural and spatial accounts have enjoyed great success in explaining human behaviour.
It is merely that the set of cognitively relevant transformations extends beyond this limited set.
	# Similarity as Transformation \| Hahn, Chater, Richardson. 2003

	## Abstract

	- Similarity is determined by the transformation distance between representations
	- Similar entities have representations that are easily transformed into each other
	- Dissimilar entities require many transformations
	- Present three experiments on the influence of transformation distance on similarity
	- Versus: featural or spatial accounts of similarity
	- Introduce transformation-based accounts of similarity

	## Intro

	The notion of similarity is used in a vast range of explanations.
	However, it is sometimes criticized to be too vague.

	The success of theories based on explicit measures of similarity that account for empirical phenomena is a counter to this criticism.
	E.g. categorization.

	Classical approaches to similarity:

	1) Spatial account (Shepard): similarity in terms of distance in a psychological space
	2) Contrast model (Tversky): similarity as a function of common and distinctive features of the entities under comparison

	Limitation:

	"They are restricted in scope by the fact that they define similarity over very specific - and very simple - kinds of representation: points in space or feature sets."

	But most theories about how we represent natural objects assume that instead we require structured representations, i.e. that involve objects and relations, the composition of an object in subparts and their respective relations, etc.

	This type of representation does not fit the two classical approaches.

	Two modern approaches

	1) Structural alignment approach to similarity, based on research on analogy
	2) This paper: Representational Distortion (R.D.): the similarity between two entities is a function of the complexity required to distort/transform one representation into the other.

	Remark on analogy from page 28:

	"An important aspect of the problem of analogical mapping is the problem of finding a connection between two complex domains which exposes the common structure between domains, and hence allows knowledge about one domain to be transformed into knowledge about the other"

	R.P. builds on Kolmogorov complexity: "the complexity of a representation, x, is the length of the shortest computer program that can generate that representation." Named K(x).

	"Kolmogorov complexity has a natural application as a measure of similarity between representations, A and B.
	The simplest measure is the length of the shortest program which takes A as input, and produces B as output – that is, the length of the shortest program that "distorts" one representation into the other.
	This quantity is called the conditional Kolmogorov complexity, and is written K(B \| A)."

	Border cases:

	1) Same representation: no transformation needed
	2) Completely different: delete old one, build new one from scratch: there is no shared information that can be exploited.

	The advantage of Kolmogorov complexity for a measure of similarity is its generality: for tree structures you have "transformational operations on trees", "sentences can be transformed by linguistic operations", etc.

	So it completely surpasses the limitation of the two classical approaches.
	Importantly, RD does not simply contrast these accounts, but subsumes them.

	Technical issue: assymetric transformations, see footnote 2., p.4.

	"In Marr's terms (Marr, 1982) RD theory seeks to characterize the computational level problem involved in determining similarity."

	It is thus not a specific psychological theory.

	Cognitive science lacks a well-specified theory of mental representation, needed for a specific psychological specification of this framework. However, this holds for the two classical approaches as well.

	RD-analysis tries to offer a framework...:

	- in the spirit of rational analysis
	- as a derivation of Shepard's Universal Law of Generalization
	- providing an explanation for the utility of similarity in inference

	Q: whats this law?

	A: Some info (without reading the original paper) can be found on page 26: "related inter-item confusability to the 'distance' between the representation of those items (Chater & Vitànyi, 2002)."

	Goal of this article: do some experiments for "assessing the empirical usefulness of the transformational approach".

	## Overview of previous work (p 5-9)

	In a strict sense, there is little research that brings together transformation and similarity.

	- Discussion of word similarity in the psychology of language.
	- Perception as uncovering sequences of transformations
	- Also involving specific transformations, e.g. mental rotations.
	- Analogical reasoning

	Analogical reasoning:

	"Most recently, computational models of analogical inference frequently postulate that analogies are made between pairs of items by representing these items so that a relatively simple set of transformations can map one item onto the other."

	"requires adopting the cointroversial stance that similarity and analogy are sufficiently close as to be modelled in a single framework"

	The Kolmogorov complexity could be an indication of the strenght of an analogical reasoning, and thus clarify when and why analogical reasoning is justified.

	Discussion of papers that make a step in the direction of discussing transformational distance in relation to similarity, showing experimental data but not directly having the goal of providing a general theory of similarity.

	## Experimental investigation

	Two criteria for the usefulness of transformation as a measure of similarity:

	1) Does it allow better coverage than featural and spatial views? (does it apply to wider stimuli)
	2) The notion of transformation must be the right level of abstraction to seek systematic relations in similarities between objects

	What is thus required? : "experimental data that directly link manipulations of transformation distance to perceived similarity"

	These criteria concern the viability of the transformation-approach in general, not of specific transformational accounts.

	Three experiments, with similar design and procedure, but very different stimuli and transformations.

	## Experiment 1

	Stimuli:

	Sequences of filled or unfilled circles, that are transformations of each other through: mirroring, reversal, phase shift, insertion, deletion.

	Goal:

	Transform into target sequence

	### Comparison with classical accounts

	Transformation distance: how many transformations needed?

	The featural and spacial account of similarity are more restrictive and can be subsumed under the transformational account. E.g. from "a transformational perspective, such featural models simply construe similarity between objects as the result of an extremely limited set of transformations, namely feature insertion, feature deletion, or feature substitution". Likewise for spatial models.

	So evidence against featural models requires showing that this restricting is poorer in explaining performance.

	Evaluation in exp. 1 is thus done by comparing predictive accuracy.

	"test whether the number of transformations between the two patterns is a better predictor of perceived similarity than is the number of mismatching individual component blobs."

	Q: measuring transformational distance in terms of the number of transformations seems to assume at least that all transformations are equally hard. But combined operations seem to be exponentially more difficult to me. Since the research aims to understand perceived similarity, is this something they can and/or should take into account?

	A: the authors address this issue at page 27:

	> Crucially, different transformations may have different degrees of
	complexity – i.e. it is possible that transformations should be ‘weighted’ in some way,
	rather than implicitly being treated as equal. Should this be correct, then the use of number
	of transformations will decrease predictive accuracy relative to that which would be
	possible if the relevant complexities were known.

	Also relevant, p. 27:

	> Finally, number of transformations will be a good predictor wherever one
	is averaging across many specific transformations and their combinations, because on
	average, two transformations will still be more complex than one, even if individual
	transformations differ in complexity.


	"each individual stimulus item was actually created from the target
	using a specific quantity of operations so that this study is based upon a priori assertions as
	to the number of transformations required. Thus, we are committed to the number of
	transformations for each image in advance of any data collection and any ambiguities
	in the number of transformations will diminish our predictive accuracy."


	### Results

	The number of transformations is a good predictor of the similarity rating of the participants.

	"The general relationship between number of transformations and mean similarity ratings is graphed in Fig. 2 which suggests, somewhat surprisingly (see, for example, Shepard, 1987), an approximately linear relationship."

	Q: why surprisingly?

	Q: But does this experiment run the risk that it finds what it puts in? The presented sequences are obtained by applying transformations.

	Rephrased:

	Hahn et al. perform a comparison of a transformation-based and featural/spatial-based model of similarity by doing three experiments. But while reading, I was wondering to what degree the way they setup their experiments biases their results: if you create experimental material by explicitly applying transformations, and if these transformations remain quite visible, are you then not biased towards finding that indeed transformations are a better indication of perceived similarity, compared to a (quite stripped down) feature-based model?

	Firstly, I'm curious whether other people had a similar intuition.
	Secondly, do you think that this intuition forms a problem for their research, or that this intuition is missing the point?


	A: the authors depart from the working assumption that it is plausible that transformations on stimuli are how psychologically similarity is perceived. But they also state researching this is not yet the point of their paper.

	The featural model assumed here is very simple, and a counter argument could be that the feature model is underrepresented or perhaps even a strawman.

	## Experiment 2

	"many of the "features" such as the orientation of an item in a pair where one has been rotated are salient only because of the relevant transformations."

	"For our stimuli this means that central object "features" will be derivative on the transformations present: for example, orientation is unlikely to have cognitive salience in a comparison until orientation is manipulated through rotations."

	## Experiment 3

	> The materials of Experiments 1 and 2 were amenable to basic featural or spatial representations
	and comparison processes, even if these are, as argued, empirically and conceptually
	less satisfactory than the explanation provided by transformation distance.
	The stimulus materials of Experiment 3 sought to take the argument against featural and
	spatial representations one step further, by using materials for which such representation
	schemes appear especially inappropriate: namely, materials where similarity is determined
	primarily by relational information.

	Main point:

	> From a transformational perspective, the Lego brick objects are of interest for two reasons. First, they allow an initial examination of the role of transformations in the similarity assessment of real-world objects, albeit maximally simple ones.
	Second, these materials support a whole new range of transformations to complement those investigated in Experiments 1 and 2. Our assumption, here, was that the judged similarity between pairs of objects would be determined primarily by the physical manipulations required to turn a target object into the comparison object.

	## General discussion

	The two criteria have been met:

	1) similarity is shown to be influenced by transformations other than the restricted set of featural / spatial (strongly correlated with mean similarity ratings)
	2) this is "shown to be influenced by a wide range of transformations which share nothing apart from the fact that they are all transformations", indicating that transformation is the right level of abstraction.

	"a crucial factor in evaluating competing accounts must be not only whether an account can be made compatible with a particular pattern of data but whether it in any way predicted it"

	The authors state that there "is nothing whatsoever in featural theories of similarity that would naturally have given rise to the predictions made on the basis of transformations in this experiment."

	And this feature account must therefore be independently motivated, not post-hoc fitted to data.
	But since transformations are defined on relations between objects, and features on objects, transformations cannot be reduced to features, and a featural theory should compete with a transformational account on a completely different level.

	Exp. 3 really focuses on the main representational weakness of featural accounts that they do not deal well with structured representations because they do not properly represent relational info.

	Page 25-26, core message:

	> From a transformational perspective, featural and spatial accounts of similarity are not wrong; they are simply too restricted.
	Changes to a feature, feature insertions and deletions as well as changes along a continuous valued dimension are all bona fide transformations; consequently, it is no surprise that both featural and spatial accounts have enjoyed great success in explaining human behaviour.
	It is merely that the set of cognitively relevant transformations extends beyond this limited set.