iandanforth/GoalDirectedSensoryMotorLearning

## GoalDirectedSensoryMotorLearning
There is almost always a goal

A goal needs to be able to *start* or *modify* behavior.

The spatial representation is a concatenation of senses and intended movement

At some level that combination needs to be represented uniquely. Aka few enough possibilities to represent them all.

When a goal is achieved the behavior that led to it needs to be reinforced.

ABC World - Possible Transitions

Moves: L, R, _

t     t+1

      AL
AL -> A_
      AR

      AL
A_ -> A_
      AR

      BL
AR -> B_
      BR

      AL
BL -> A_
      AR

      BL 33%
B_ -> B_ 33%
      BR 33%

      CL
BR -> C_
      CR

      BL
CL -> B_
      BR

      CL
C_ -> C_
      CR

      CL
CR -> C_
      CR

Location: B
Move: _
New Goal: A

Option 1

Goal influences predicted states.

1. Goal A is active over t, t+1, t+2
2. 33% of the time B_ transitions to BL
3. If BL is selected at t+1, then at t+2 you get A* and your goal is achieved.
4. What do you reinforce at this time?
5. Do you want to make the B_ -> BL transition more likely?
5.1 This would make learning an alternate goal, C, more difficult.
6. Do you want to make the BL -> A* more likely?
6.1 This is guaranteed.

Option 2

Goal connects to active states.

1. Goal A is active over t, t+1, t+2
2. 33% of the time B_ transitions to BL
3. BL is a unique set of columns
4. When BL -> A* occurs, reinforcement selectively increments BL's connections to A.
5. Next time we are picking which columns become active and A is active, BL columns are boosted.
6. The stronger the connection between BL and A, the more likely A being active will cause BL to be active.
7. If you then have active A, and would normally have B_ from bottoms up, A would cause the L set of columns to become active. Producing BL.
8. Thus B_ + Active Goal A = BL. Which leads to A*.

Layer 5 -> Behavior Generator Learning

1. Assume the set of cells whose activation causes Left and Right moves are fixed.

0000000000111111111 - Left
1111111111000000000 - Right

2. Layer 5 cells start out connecting randomly
3. Over time if they participate in a AL, BL, or CL representation they will be more strongly connected to cells in the Left move set.
4. Over time if they participate in AR, BR, or CR representation they will be more strogly tied to Right cells.
5. The cells which are strongly tied to sensory inputs (ABC components of represenations) will be LESS strongly associated with either a Left or Right set of move cells.
6. The cells which are strongly tied to move inputs (LR components of representations) will be almost exclusively associated with the Left or Right set of cells.
7. Once connections are numerous enough the activation of layer 5 cells can begin to drive and/or over-ride the activation of behavior generator cells.

Behavior Chaining

A behavior that leads to a reward becomes rewarding. How this should happen though???
Basics - http://en.wikipedia.org/wiki/Chaining
Practical example in dogs - http://www.clickertraining.com/node/1764
In classic reinforcement learning examples there is a "scalar intermediate reward" which allows behaviours leading to a goal to be reinforced. But this is very deus ex machina
http://en.wikipedia.org/wiki/Reinforcement_learning
	There is almost always a goal

	A goal needs to be able to start or modify behavior.

	The spatial representation is a concatenation of senses and intended movement

	At some level that combination needs to be represented uniquely. Aka few enough possibilities to represent them all.

	When a goal is achieved the behavior that led to it needs to be reinforced.

	ABC World - Possible Transitions

	Moves: L, R, _

	t t+1

	AL
	AL -> A_
	AR

	AL
	A_ -> A_
	AR

	BL
	AR -> B_
	BR

	AL
	BL -> A_
	AR

	BL 33%
	B_ -> B_ 33%
	BR 33%

	CL
	BR -> C_
	CR

	BL
	CL -> B_
	BR

	CL
	C_ -> C_
	CR

	CL
	CR -> C_
	CR

	Location: B
	Move: _
	New Goal: A

	Option 1

	Goal influences predicted states.

	1. Goal A is active over t, t+1, t+2
	2. 33% of the time B_ transitions to BL
	3. If BL is selected at t+1, then at t+2 you get A* and your goal is achieved.
	4. What do you reinforce at this time?
	5. Do you want to make the B_ -> BL transition more likely?
	5.1 This would make learning an alternate goal, C, more difficult.
	6. Do you want to make the BL -> A* more likely?
	6.1 This is guaranteed.

	Option 2

	Goal connects to active states.

	1. Goal A is active over t, t+1, t+2
	2. 33% of the time B_ transitions to BL
	3. BL is a unique set of columns
	4. When BL -> A* occurs, reinforcement selectively increments BL's connections to A.
	5. Next time we are picking which columns become active and A is active, BL columns are boosted.
	6. The stronger the connection between BL and A, the more likely A being active will cause BL to be active.
	7. If you then have active A, and would normally have B_ from bottoms up, A would cause the L set of columns to become active. Producing BL.
	8. Thus B_ + Active Goal A = BL. Which leads to A*.

	Layer 5 -> Behavior Generator Learning

	1. Assume the set of cells whose activation causes Left and Right moves are fixed.

	0000000000111111111 - Left
	1111111111000000000 - Right

	2. Layer 5 cells start out connecting randomly
	3. Over time if they participate in a AL, BL, or CL representation they will be more strongly connected to cells in the Left move set.
	4. Over time if they participate in AR, BR, or CR representation they will be more strogly tied to Right cells.
	5. The cells which are strongly tied to sensory inputs (ABC components of represenations) will be LESS strongly associated with either a Left or Right set of move cells.
	6. The cells which are strongly tied to move inputs (LR components of representations) will be almost exclusively associated with the Left or Right set of cells.
	7. Once connections are numerous enough the activation of layer 5 cells can begin to drive and/or over-ride the activation of behavior generator cells.

	Behavior Chaining

	A behavior that leads to a reward becomes rewarding. How this should happen though???
	Basics - http://en.wikipedia.org/wiki/Chaining
	Practical example in dogs - http://www.clickertraining.com/node/1764
	In classic reinforcement learning examples there is a "scalar intermediate reward" which allows behaviours leading to a goal to be reinforced. But this is very deus ex machina
	http://en.wikipedia.org/wiki/Reinforcement_learning