shagunsodhani/Teaching Machines to Read and Comprehend.md

## Teaching Machines to Read and Comprehend.md

      
    Raw
  

              Teaching Machines to Read and Comprehend.md
            
          
    Teaching Machines to Read and Comprehend

Introduction


Build a supervised reading comprehension data set using news corpus.
Compare the performance of neural models and state-of-the-art natural language processing model on reading comprehension task.
Link to the paper

Reading Comprehension


Estimate conditional probability p(a|c, q), where c is a context document, q is a query related to the document, and a is the answer to that query.

Dataset Generation


Use online newspapers (CNN and DailyMail) and their matching summaries.
Parse summaries and bullet points into Cloze style questions.
Generate corpus of document-query-answer triplets by replacing one entity at a time with a placeholder.
Data anonymized and randomised using coreference systems, abstract entity markers and random permutation of the entity markers.
The processed data set is more focused in terms of evaluating reading comprehension as models can not exploit co-occurrence.

Models

Baseline Models


Majority Baseline

Picks the most frequently observed entity in the context document.


Exclusive Majority

Picks the most frequently observed entity in the context document which is not observed in the query.


Symbolic Matching Models


Frame-Semantic Parsing

Parse the sentence to find predicates to answer questions like "who did what to whom".
Extracting entity-predicate triples (e1,V, e2) from query q and context document d
Resolve queries using rules like exact match, matching entity etc.


Word Distance Benchmark

Align placeholder of Cloze form questions with each possible entity in the context document and calculate the distance between the question and the context around the aligned entity.
Sum the distance of every word in q to their nearest aligned word in d


Neural Network Models


Deep LSTM Reader

Test the ability of Deep LSTM encoders to handle significantly longer sequences.
Feed the document query pair as a single large document, one word at a time.
Use Deep LSTM cell with skip connections from input to hidden layers and hidden layer to output.


Attentive Reader

Employ attention model to overcome the bottleneck of fixed width hidden vector.
Encode the document and the query using separate bidirectional single layer LSTM.
Query encoding is obtained by concatenating the final forward and backwards outputs.
Document encoding is obtained by a weighted sum of output vectors (obtained by concatenating the forward and backwards outputs).
The weights can be interpreted as the degree to which the network attends to a particular token in the document.
Model completed by defining a non-linear combination of document and query embedding.


Impatient Reader

As an add-on to the attentive reader, the model can re-read the document as each query token is read.
Model accumulates the information from the document as each query token is seen and finally outputs a joint document query representation in the form of a non-linear combination of document embedding and query embedding.


Result


Attentive and Impatient Readers outperform all other models highlighting the benefits of attention modelling.
Frame-Semantic pipeline does not scale to cases where several methods are needed to answer a query.
Moreover, they provide poor coverage as a lot of relations do not adhere to the default predicate-argument structure.
Word Distance approach outperformed the Frame-Semantic approach as there was significant lexical overlap between the query and the document.
The paper also includes heat maps over the context documents to visualise the attention mechanism.