End-to-end optimization of goal-driven and visually grounded dialogue systems
- The paper introduces an architecture for end-to-end Reinforcement Learning (RL) optimization for task-oriented dialogue systems and its application to a multimodal task - grounding the dialogue into a visual context.
Encoder Decoder Models vs RL Models
- Encoder Decoder models do not account for the planning problems (which are inherent in the dialogue systems) and do not integrate seamlessly with external contexts or knowledge bases.
- RL models can handle the planning problem but require online learning and a predefined structure of the task.