Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shagunsodhani/a863eb099bb7a1ab4831cd37bffffb04 to your computer and use it in GitHub Desktop.
Save shagunsodhani/a863eb099bb7a1ab4831cd37bffffb04 to your computer and use it in GitHub Desktop.
Notes for Teaching Machines to Read and Comprehend paper

Teaching Machines to Read and Comprehend

Introduction

  • Build a supervised reading comprehension data set using news corpus.
  • Compare the performance of neural models and state-of-the-art natural language processing model on reading comprehension task.
  • Link to the paper

Reading Comprehension

  • Estimate conditional probability p(a|c, q), where c is a context document, q is a query related to the document, and a is the answer to that query.

Dataset Generation

  • Use online newspapers (CNN and DailyMail) and their matching summaries.
  • Parse summaries and bullet points into Cloze style questions.
  • Generate corpus of document-query-answer triplets by replacing one entity at a time with a placeholder.
  • Data anonymized and randomised using coreference systems, abstract entity markers and random permutation of the entity markers.
  • The processed data set is more focused in terms of evaluating reading comprehension as models can not exploit co-occurrence.

Models

Baseline Models

  • Majority Baseline
    • Picks the most frequently observed entity in the context document.
  • Exclusive Majority
    • Picks the most frequently observed entity in the context document which is not observed in the query.

Symbolic Matching Models

  • Frame-Semantic Parsing

    • Parse the sentence to find predicates to answer questions like "who did what to whom".
    • Extracting entity-predicate triples (e1,V, e2) from query q and context document d
    • Resolve queries using rules like exact match, matching entity etc.
  • Word Distance Benchmark

    • Align placeholder of Cloze form questions with each possible entity in the context document and calculate the distance between the question and the context around the aligned entity.
    • Sum the distance of every word in q to their nearest aligned word in d

Neural Network Models

  • Deep LSTM Reader

    • Test the ability of Deep LSTM encoders to handle significantly longer sequences.
    • Feed the document query pair as a single large document, one word at a time.
    • Use Deep LSTM cell with skip connections from input to hidden layers and hidden layer to output.
  • Attentive Reader

    • Employ attention model to overcome the bottleneck of fixed width hidden vector.
    • Encode the document and the query using separate bidirectional single layer LSTM.
    • Query encoding is obtained by concatenating the final forward and backwards outputs.
    • Document encoding is obtained by a weighted sum of output vectors (obtained by concatenating the forward and backwards outputs).
    • The weights can be interpreted as the degree to which the network attends to a particular token in the document.
    • Model completed by defining a non-linear combination of document and query embedding.
  • Impatient Reader

    • As an add-on to the attentive reader, the model can re-read the document as each query token is read.
    • Model accumulates the information from the document as each query token is seen and finally outputs a joint document query representation in the form of a non-linear combination of document embedding and query embedding.

Result

  • Attentive and Impatient Readers outperform all other models highlighting the benefits of attention modelling.
  • Frame-Semantic pipeline does not scale to cases where several methods are needed to answer a query.
  • Moreover, they provide poor coverage as a lot of relations do not adhere to the default predicate-argument structure.
  • Word Distance approach outperformed the Frame-Semantic approach as there was significant lexical overlap between the query and the document.
  • The paper also includes heat maps over the context documents to visualise the attention mechanism.
@yauhen-info
Copy link

Thank you for sharing a good piece of work.
Let me also ask if you had found a link to an implementation of the Attentive and Impatient Readers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment