/yala-minedump.txt

## yala-minedump.txt
TITLE:  LEARNING HIGH-LEVEL PLANS FROM QUESTIONS-[YALA-MIND-DUMP]
=======================================================================

##YALA-PREFACE:
##---------------------------------------------------------------------
##I wrote this whole thing in emacs sans spell check (though i'm sure there is a plugin for that somewhere) so i apoligize in advance.

ABSTRACT:
-----------------------------------------------------------------------
Complex problems often require various domain knowledge to solve. Given this knowledge aprioti, High-level planning agents are robust. However, it can be incredibly difficult to collect all the potentially useful information apriori. In this paper, we present an approach to creating agents more robust to lack of information through question asking. We extend an existing planning agent to aquire information live as part of it's planning process. We then leverage environmental reward, wether or not our problem was solved, to learn what to ask, when to ask it, how to plan and how to interpret text.


INTRO:
-----------------------------------------------------------------------
In high-level planning, we leverage domain knowledge, and general purpose planning machinery to solve problems. However, aquiring this domain knowledge  apriori can be difficult and limits how general the agent can be.

[TODO] Introduce idea of aquire info from human or other robot collaborators.
[TODO] Introduce idea of making any agent more robust by adding question asking
[TODO] Introduce work specifically


RELATED WORK:
-----------------------------------------------------------------------
The previous work on question can be split in two directions.

Resolution of Ambiguity
Dealing with failure


Resolution of Ambiguity
***********************

Tellex et Al explored using question asking as a way to resolve grounding ambiguity in agents in the context of instruction following robots. A robot might be given the instructions to move a steal plate to the truck. Suppose there are two trucks in the robots environment. The robot must now deal with ambiguity. Tellex tackles this problem by asking questions.

More specifically, she explores two models for identifying when a question is necessary (entropy on the grounding level, and entropy on the planning level), and ask questions via simple templates.

The robot then receives a natural language response and is more able to completely tackle the task.

Dealing with failure
**********************

It is not uncommon for an agent to be unable complete a task. What a human might do in this case is ask for help. In Tellex et Al (MIT), researchers explore how a robot could generate an effective ask for help, to best empower humans to make a difference.

More specifically, she applies an inverse mapping from her ground graph for instruction following to a question

[TODO] Reread papers and describe work in more detail
[TODO] Not sure amt of detail necessary.

[LINKS]:
Asking for help using Inverse Semnatics
http://people.csail.mit.edu/stefie10/publications/tellex14.pdf

Clarifying Commands with Information Theoretic Human-Robot Dialog
http://people.csail.mit.edu/stefie10/publications/deits13.pdf

We are tackling a different problem, namely, guiding the planing process in itself.

BACKGROUND:
-----------------------------------------------------------------------
In the work, we directly extend the Text-Aware High-Level Planner by Branvan et Al 2012. To understand our work, it is important to understand the following:

High-level planning:

In high-level planning we decompose large problems into smaller ones. An example of this in our domain is splitting the problem of aquiring an iron door, to the problems of aquiring a forge and some iron.

Precondition-condition Information:

In previous results, it was shown that having apriori information abouthow problems relate can significantly improve planning performance. This is best shown through example. The following is an example of a setence in natural language, the precondition-condtion it encodes, and how that could aid the agent.

%% Natural text

   "A pickaxe, which can be made from wood, can be used to mine stone."

%% Precondition condition

   // read as wood is a precondition to making a pickaxe
   wood -> pickaxe

   // read as pickaxe is a precondition to aquiring stone
   pickaxe -> stone

%% Planning outcome
   % Without Precondition Condition
   //The agent only learns from environment, and has little direct guidance in selecting subproblems
   (Glass) (Pickaxe)  (Stone)  [FAIL]
   % With Precondition Condition
   //The agent now is guided by its knowledge base (set of preconditons) to construct the plan
   (Wood) (Pickaxe) (Stone)

The precondition-condition relationship extract is learned by feedback from the environment.


MODEL:
-----------------------------------------------------------------------
Our goal in this work was to learn aquire the right information, at the right time for each problem.

We can discuss our model in two areas, question asking, and question answering.


Asking the Questions
***********************
Question asking can be split into two tasks

Modeling Questions
####################
The minecraft domain is fairly restricted. Almost every problem comes of the form "have number of  OBJECT".

Given this tempalte based problem structure, we created simple question templates.
Namely,

"Tell me about the object OBJECT"
and
"Tell me about the action ACTION"

The use of question based templates was both motivated by Tellex et Al work in resolving groundings, and the template structure of problems.

Learning to ask them
####################

To do this, we explored several different approaches.

1. Embedding questions within the subgoal policy

In this approach, we expand the state-space of the subgoal policy and treat questions as special types of subgoals. The only diffrence between questions and subgoals is that questions must be "solved" by asking the IR sysetm, and subgoals are sent to a low level planner.

Questions and subgoals share the same feature space, and the policy now has to learn over a 40% larger state space.

This comes with a couple caviates. Whereas subgoals are all sampled, then, then all solved, questions must be solved immediatly, in order to update the knowledge base and impact the choice of subgoal within the same problem.

2. Creating and learning a seperate question model over inital and target

In this approach, we build a seperate log-linear model to pick which questions to ask with it's own state space and feature space.
This model conditions over the initial state, and target problem, and generates a short sequence of questions. The agent then learns fromt these questions before starting to plan.


3. Creating and learning a seperate question model over initial, traget, and subgoal sequence

In this approach, we take (2) and also condition over the attemped sampled subgoal sequence.

After answering the questions, we then resample the subgoal sequence.

Answering the Questions
***********************
As an end goal, we would to build collaborative agents that ask questions, and get answers from their human friends.

To similuate these helpful humans, we built a simple Information Retriveil system that matched the object of each question to sentences in the minecraft wikipedia.

---------------DETAILED DESCRIPTION----------------------
More specifically, we stemmed and transformed each question object to a natural language name. An example of this would be the simple transformation from iron-door to iron door.

We then built indices to these words to sentences in the corpus. When asked a question, we would select some matching setences to return at random.

This had a several favorable properties.

First, it keeps our results comparable to Branvan's results, as we aquiring at most only the information he gave his agent apriori.

Secondly, it makes answers free to generate, allowing us to leverage hundreds of iterations and learn our model via reinforment learning
---------------END DETAILED BIT ---------------------------------

To recap, our core contribution is a framework to ask and answer questions as part of the planning process.


CHALLANGES
-----------------------------------------------------------------------
We face 3 main challenges in this project. We list and describe them in order of importance.

Technical Debt and Existing Complexity
**************************************
Extending a 70k cpp codebase is no small task. This has caused very slow development cycles

Feature Engineering
**************************************
The existing work relies on large amounts of space features engineered to work very well with subgoals. Extending the space, or adding in a space of questions does not mesh well with the existing feature engineering.

[TODO] Actaully, im still not so sure about this hypothesis. See my learning issue. I think that might the bigger issue.

Learning
**************************************
Given our agent is likely to only have any one specific line of text available for short periods of time during only some iterations, we face several large challanges

         Learning to interpret text
         ####################################
         it is incredibly difficult for the agent to learn how to extract the precondition, condition relationships. The previous leveraged success and failure of each subgoal pair to better tune it's understanding of the text. With out system, it is less possible.

         Learning to weight the aquired-precondition conditions
         #####################################
         It is also difficult to learn to weight specific preconditon condition features as important if they are almost never present. Given the weights are initialized to zero, if aquire some information, we will likely not use it and fial the first iteration, tune the weights, but might not ask that question again

EXPERIMENT:
-----------------------------------------------------------------------
Random 5 questions

We tried asking 5 random questions at the beggining of each problem. This was done to explore whether or asking the right question is actualy a problem in our domain.

We tried this both returning all responses and 5 responses.

Simple Heuristics
[STILL UNDER DEVELOPMENT]
As a next step, we explored asking about the end goal at the beginning of each problem. This was done to establish a baseline to see how well simple methods perform

Approach 1: One Policy for Questions and Subgoals

We implemented this approach and ran it. Each question would get 5 sentences of responses.

Approach 2: Seperate Policy for Questions over just the problem
[STILL UNDER DEVELOPMENT]

Approach 3: Seperate Policy for Questions over just the problem
[STILL UNDER DEVELOPMENT]

Approach 1 with saved Theta C
[STILL UNDER DEVELOPMENT]

Approach 2 with saved Theta C
[STILL UNDER DEVELOPMENT]

Approach 3 with saved Theta C
[STILL UNDER DEVELOPMENT]


Why use saved Theta C?
We wish to see if we can learn to use good questions if we remove the challange of interpretation.


RESULTS:
-----------------------------------------------------------------------
Our core goal was creating more robust agents by empowering to ask questions. In that regard, we have already succeeded.

The agent without apriori information is able to solve 69% of the problems. Our agent with question asking is able to solve 73%.

In proving how  robust we can make our planner with question asking, we have only taken a step. We are still actively tacking engineering challanges to allow our various models to ask more, and better questions.


CURRENT WORK:
-----------------------------------------------------------------------
We currently actively developing everything tagged as [STILL UNDER DEVELOPMENT].

FUTURE WORK:
-----------------------------------------------------------------------
There is a lot of opputunity to explore richer archetectures.
We can explore richer learning archetectures, with deep reinforcement learning (Karthik Paper) or any technique to avoid manual feature engineering.

Future work can also be for extening other more general problem solvers. Planners specifcally are very contrained by their logic nature


CONCLUSION:
-----------------------------------------------------------------------
We have shown that planners can be made more robust to lack of information if we extend them with question asking.
It is still unproven if we can reach full-aprioi perfromace using this.


AUTHORS NOTE:
-----------------------------------------------------------------------
 _________________________________________
/ Hey guys, here is my mind dump for      \
| paper stuff. I hope this is helpful.    |
| Let me know anyway I can help. I wrote  |
| this all on a plane sans internet. I'll |
| links to stuff when I get back on the   |
\ grid. HAPPY HOLIDAYS.                   /
 -----------------------------------------
\                             .       .
 \                           / `.   .' "
  \                  .---.  <    > <    >  .---.
   \                 |    \  \ - ~ ~ - /  /    |
         _____          ..-~             ~-..-~
        |     |   \~~~\.'                    `./~~~/
       ---------   \__/                        \__/
      .'  O    \     /               /       \  "
     (_____,    `._.'               |         }  \/~~~/
      `----.          /       }     |        /    \__/
            `-.      |       /      |       /      `. ,~~|
                ~-.__|      /_ - ~ ^|      /- _      `..-'
                     |     /        |     /     ~-.     `-. _  _  _
                     |_____|        |_____|         ~ - . _ _ _ _ _>
	TITLE: LEARNING HIGH-LEVEL PLANS FROM QUESTIONS-[YALA-MIND-DUMP]
	=======================================================================

	##YALA-PREFACE:
	##---------------------------------------------------------------------
	##I wrote this whole thing in emacs sans spell check (though i'm sure there is a plugin for that somewhere) so i apoligize in advance.

	ABSTRACT:
	-----------------------------------------------------------------------
	Complex problems often require various domain knowledge to solve. Given this knowledge aprioti, High-level planning agents are robust. However, it can be incredibly difficult to collect all the potentially useful information apriori. In this paper, we present an approach to creating agents more robust to lack of information through question asking. We extend an existing planning agent to aquire information live as part of it's planning process. We then leverage environmental reward, wether or not our problem was solved, to learn what to ask, when to ask it, how to plan and how to interpret text.


	INTRO:
	-----------------------------------------------------------------------
	In high-level planning, we leverage domain knowledge, and general purpose planning machinery to solve problems. However, aquiring this domain knowledge apriori can be difficult and limits how general the agent can be.

	[TODO] Introduce idea of aquire info from human or other robot collaborators.
	[TODO] Introduce idea of making any agent more robust by adding question asking
	[TODO] Introduce work specifically



	RELATED WORK:
	-----------------------------------------------------------------------
	The previous work on question can be split in two directions.

	Resolution of Ambiguity
	Dealing with failure


	Resolution of Ambiguity
	***********************

	Tellex et Al explored using question asking as a way to resolve grounding ambiguity in agents in the context of instruction following robots. A robot might be given the instructions to move a steal plate to the truck. Suppose there are two trucks in the robots environment. The robot must now deal with ambiguity. Tellex tackles this problem by asking questions.

	More specifically, she explores two models for identifying when a question is necessary (entropy on the grounding level, and entropy on the planning level), and ask questions via simple templates.

	The robot then receives a natural language response and is more able to completely tackle the task.

	Dealing with failure
	**********************

	It is not uncommon for an agent to be unable complete a task. What a human might do in this case is ask for help. In Tellex et Al (MIT), researchers explore how a robot could generate an effective ask for help, to best empower humans to make a difference.

	More specifically, she applies an inverse mapping from her ground graph for instruction following to a question

	[TODO] Reread papers and describe work in more detail
	[TODO] Not sure amt of detail necessary.

	[LINKS]:
	Asking for help using Inverse Semnatics
	http://people.csail.mit.edu/stefie10/publications/tellex14.pdf

	Clarifying Commands with Information Theoretic Human-Robot Dialog
	http://people.csail.mit.edu/stefie10/publications/deits13.pdf

	We are tackling a different problem, namely, guiding the planing process in itself.

	BACKGROUND:
	-----------------------------------------------------------------------
	In the work, we directly extend the Text-Aware High-Level Planner by Branvan et Al 2012. To understand our work, it is important to understand the following:

	High-level planning:

	In high-level planning we decompose large problems into smaller ones. An example of this in our domain is splitting the problem of aquiring an iron door, to the problems of aquiring a forge and some iron.

	Precondition-condition Information:

	In previous results, it was shown that having apriori information abouthow problems relate can significantly improve planning performance. This is best shown through example. The following is an example of a setence in natural language, the precondition-condtion it encodes, and how that could aid the agent.

	%% Natural text

	"A pickaxe, which can be made from wood, can be used to mine stone."

	%% Precondition condition

	// read as wood is a precondition to making a pickaxe
	wood -> pickaxe

	// read as pickaxe is a precondition to aquiring stone
	pickaxe -> stone

	%% Planning outcome
	% Without Precondition Condition
	//The agent only learns from environment, and has little direct guidance in selecting subproblems
	(Glass) (Pickaxe) (Stone) [FAIL]
	% With Precondition Condition
	//The agent now is guided by its knowledge base (set of preconditons) to construct the plan
	(Wood) (Pickaxe) (Stone)

	The precondition-condition relationship extract is learned by feedback from the environment.


	MODEL:
	-----------------------------------------------------------------------
	Our goal in this work was to learn aquire the right information, at the right time for each problem.

	We can discuss our model in two areas, question asking, and question answering.


	Asking the Questions
	***********************
	Question asking can be split into two tasks

	Modeling Questions
	####################
	The minecraft domain is fairly restricted. Almost every problem comes of the form "have number of OBJECT".

	Given this tempalte based problem structure, we created simple question templates.
	Namely,

	"Tell me about the object OBJECT"
	and
	"Tell me about the action ACTION"

	The use of question based templates was both motivated by Tellex et Al work in resolving groundings, and the template structure of problems.

	Learning to ask them
	####################

	To do this, we explored several different approaches.

	1. Embedding questions within the subgoal policy

	In this approach, we expand the state-space of the subgoal policy and treat questions as special types of subgoals. The only diffrence between questions and subgoals is that questions must be "solved" by asking the IR sysetm, and subgoals are sent to a low level planner.

	Questions and subgoals share the same feature space, and the policy now has to learn over a 40% larger state space.

	This comes with a couple caviates. Whereas subgoals are all sampled, then, then all solved, questions must be solved immediatly, in order to update the knowledge base and impact the choice of subgoal within the same problem.

	2. Creating and learning a seperate question model over inital and target

	In this approach, we build a seperate log-linear model to pick which questions to ask with it's own state space and feature space.
	This model conditions over the initial state, and target problem, and generates a short sequence of questions. The agent then learns fromt these questions before starting to plan.


	3. Creating and learning a seperate question model over initial, traget, and subgoal sequence

	In this approach, we take (2) and also condition over the attemped sampled subgoal sequence.

	After answering the questions, we then resample the subgoal sequence.

	Answering the Questions
	***********************
	As an end goal, we would to build collaborative agents that ask questions, and get answers from their human friends.

	To similuate these helpful humans, we built a simple Information Retriveil system that matched the object of each question to sentences in the minecraft wikipedia.

	---------------DETAILED DESCRIPTION----------------------
	More specifically, we stemmed and transformed each question object to a natural language name. An example of this would be the simple transformation from iron-door to iron door.

	We then built indices to these words to sentences in the corpus. When asked a question, we would select some matching setences to return at random.

	This had a several favorable properties.

	First, it keeps our results comparable to Branvan's results, as we aquiring at most only the information he gave his agent apriori.

	Secondly, it makes answers free to generate, allowing us to leverage hundreds of iterations and learn our model via reinforment learning
	---------------END DETAILED BIT ---------------------------------

	To recap, our core contribution is a framework to ask and answer questions as part of the planning process.


	CHALLANGES
	-----------------------------------------------------------------------
	We face 3 main challenges in this project. We list and describe them in order of importance.

	Technical Debt and Existing Complexity
	**************************************
	Extending a 70k cpp codebase is no small task. This has caused very slow development cycles

	Feature Engineering
	**************************************
	The existing work relies on large amounts of space features engineered to work very well with subgoals. Extending the space, or adding in a space of questions does not mesh well with the existing feature engineering.

	[TODO] Actaully, im still not so sure about this hypothesis. See my learning issue. I think that might the bigger issue.

	Learning
	**************************************
	Given our agent is likely to only have any one specific line of text available for short periods of time during only some iterations, we face several large challanges

	Learning to interpret text
	####################################
	it is incredibly difficult for the agent to learn how to extract the precondition, condition relationships. The previous leveraged success and failure of each subgoal pair to better tune it's understanding of the text. With out system, it is less possible.

	Learning to weight the aquired-precondition conditions
	#####################################
	It is also difficult to learn to weight specific preconditon condition features as important if they are almost never present. Given the weights are initialized to zero, if aquire some information, we will likely not use it and fial the first iteration, tune the weights, but might not ask that question again

	EXPERIMENT:
	-----------------------------------------------------------------------
	Random 5 questions

	We tried asking 5 random questions at the beggining of each problem. This was done to explore whether or asking the right question is actualy a problem in our domain.

	We tried this both returning all responses and 5 responses.

	Simple Heuristics
	[STILL UNDER DEVELOPMENT]
	As a next step, we explored asking about the end goal at the beginning of each problem. This was done to establish a baseline to see how well simple methods perform

	Approach 1: One Policy for Questions and Subgoals

	We implemented this approach and ran it. Each question would get 5 sentences of responses.

	Approach 2: Seperate Policy for Questions over just the problem
	[STILL UNDER DEVELOPMENT]

	Approach 3: Seperate Policy for Questions over just the problem
	[STILL UNDER DEVELOPMENT]

	Approach 1 with saved Theta C
	[STILL UNDER DEVELOPMENT]

	Approach 2 with saved Theta C
	[STILL UNDER DEVELOPMENT]

	Approach 3 with saved Theta C
	[STILL UNDER DEVELOPMENT]


	Why use saved Theta C?
	We wish to see if we can learn to use good questions if we remove the challange of interpretation.


	RESULTS:
	-----------------------------------------------------------------------
	Our core goal was creating more robust agents by empowering to ask questions. In that regard, we have already succeeded.

	The agent without apriori information is able to solve 69% of the problems. Our agent with question asking is able to solve 73%.

	In proving how robust we can make our planner with question asking, we have only taken a step. We are still actively tacking engineering challanges to allow our various models to ask more, and better questions.




	CURRENT WORK:
	-----------------------------------------------------------------------
	We currently actively developing everything tagged as [STILL UNDER DEVELOPMENT].

	FUTURE WORK:
	-----------------------------------------------------------------------
	There is a lot of opputunity to explore richer archetectures.
	We can explore richer learning archetectures, with deep reinforcement learning (Karthik Paper) or any technique to avoid manual feature engineering.

	Future work can also be for extening other more general problem solvers. Planners specifcally are very contrained by their logic nature


	CONCLUSION:
	-----------------------------------------------------------------------
	We have shown that planners can be made more robust to lack of information if we extend them with question asking.
	It is still unproven if we can reach full-aprioi perfromace using this.



	AUTHORS NOTE:
	-----------------------------------------------------------------------
	_________________________________________
	/ Hey guys, here is my mind dump for \
	\| paper stuff. I hope this is helpful. \|
	\| Let me know anyway I can help. I wrote \|
	\| this all on a plane sans internet. I'll \|
	\| links to stuff when I get back on the \|
	\ grid. HAPPY HOLIDAYS. /
	-----------------------------------------
	\ . .
	\ / `. .' "
	\ .---. < > < > .---.
	\ \| \ \ - ~ ~ - / / \|
	_____ ..-~ ~-..-~
	\| \| \~~~\.' `./~~~/
	--------- \__/ \__/
	.' O \ / / \ "
	(_____, `._.' \| } \/~~~/
	`----. / } \| / \__/
	`-. \| / \| / `. ,~~\|
	~-.__\| /_ - ~ ^\| /- _ `..-'
	\| / \| / ~-. `-. _ _ _
	\|_____\| \|_____\| ~ - . _ _ _ _ _>