Skip to content

Instantly share code, notes, and snippets.

@shagunsodhani
Last active April 30, 2023 04:13
Show Gist options
  • Star 52 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save shagunsodhani/a5e0baa075b4a917c0a69edc575772a8 to your computer and use it in GitHub Desktop.
Save shagunsodhani/a5e0baa075b4a917c0a69edc575772a8 to your computer and use it in GitHub Desktop.
Summary of paper "Key-Value Memory Networks for Directly Reading Documents"

Key-Value Memory Networks for Directly Reading Documents

Introduction

  • Knowledge Bases (KBs) are effective tools for Question Answering (QA) but are often too restrictive (due to fixed schema) and too sparse (due to limitations of Information Extraction (IE) systems).
  • The paper proposes Key-Value Memory Networks, a neural network architecture based on Memory Networks that can leverage both KBs and raw data for QA.
  • The paper also introduces MOVIEQA, a new QA dataset that can be answered by a perfect KB, by Wikipedia pages and by an imperfect KB obtained using IE techniques thereby allowing a comparison between systems using any of the three sources.
  • Link to the paper.

Related Work

  • TRECQA and WIKIQA are two benchmarks where systems need to select the sentence containing the correct answer, given a question and a list of candidate sentences.
  • These datasets are small and make it difficult to compare the systems using different sources.
  • Best results on these benchmarks are reported by CNNs and RNNs with attention mechanism.

Key-Value Memory Networks

  • Extension of Memory Networks Model.
  • Generalises the way context is stored in memory.
  • Comprises of a memory made of slots in the form of pair of vectors (k1, v1)...(km, vm) to encode long-term and short-term context.

Reading the Memory

  • Key Hashing - Question, x is used to preselect a subset of array (kh1, vh1)...(khN, vhN) where the key shares atleast one word with x and frequency of the words is less than 1000.
  • Key Addresing - Each candidate memory is assigned a relevance probability:
    • phi = softmax(AφX(x).AφK(khi))
    • φ is a feature map of dimension D and A is a dxD matrix.
  • Value Reading - Value of memories are read by taking their weighted sum using addressing probabilites and a vector o is returned.
  • o = sum(phiV(vhi))
  • Memory access process conducted by "controller" neural network using q = AφX(x) as the query.
  • Query is updated using
    • q2 = R1(q+o)
  • Addressing and reading steps are repeated using new Ri matrices to retrive more pertinent information in subsequent access.
  • After a fixed number of hops, H, resulting state of controller is used to compute a final prediction.
  • a = argmax(softmax(qH+1TY(yi))) where yi are the possible candidate outputs and B is a dXD matrix.
  • The network is trained end-to-end using a cross entropy loss, backpropogation and stochastic gradient.
  • End-to-End Memory Networks can be viewed as a special case of Key-Value Memory Networks by setting key and value to be the same for all the memories.

Variants of Key-Value Memories

  • φx and φy - feature map corresponding to query and answer are fixed as bag-of-words representation.

KB Triple

  • Triplets of the form "subject relation object" can be represented in Key-Value Memory Networks with subject and relation as the key and object as the value.
  • In standard Memory Networks, the whole triplet would have to be embedded in the same memory slot.
  • The reversed relations "object is_related_to subject" can also be stored.

Sentence Level

  • A document can be split into sentences with each sentence encoded in the key-value pair of the memory slot as a bag-of-words.

Window Level

  • Split the document in the windows of W words and represent it as bag-of-words.
  • The window becomes the key and the central word becomes the value.

Window + Centre Encoding

  • Instead of mixing the window centre with the rest of the words, double the size of the dictionary and encode the centre of the window and the value using the second dictionary.

Window + Title

  • Since title of the document could contain useful information, the word window can be encoded as the key and document title as the value.
  • The key could be augmented with features like "window" and "title" to distinguish between different cases.

MOVIEQA Benchmark

Knowledge Representation

  • Doc - Raw documents (from Wikipedia) related to movies.
  • KB - Graph based KB made of entities and relations.
  • IE - Performing Information Extraction on Wikipedia to create a KB.
  • The QA pairs should be answerable by both raw document and KB so that the three approaches can be compared and the gap between the three solutions can be closed.
  • The dataset has more than 100000 QA pairs, making it much larger than most existing datasets.

Experiments

MOVIEQA

Systems Compared

Observations

  • Key-Value Memory Networks outperforms all methods on all data sources.
  • KB > Doc > IE
  • The best memory representation for directly reading documents uses "Window Level + Centre Encoding + Title".

KB vs Synthetic Document Analysis

  • Given KB triplets, construct synthetic "Wikipedia" articles using templates, conjuctions and coreferences to determine the causes for gap in performance when using KB vs doc.
  • Loss in One Template sentences are due to difficulty of extracting subject, relation and object from the artifical docs.
  • Using multiple templates does not detoriate performance much. But conjuctions and coreferences cause a dip in performance.

WIKIQA

@ylytju
Copy link

ylytju commented Mar 7, 2018

In the sentence level, what do play the role of the key and value? In this case, what's the difference between the Key-value MM and the traditional MM?

@shagunsodhani
Copy link
Author

In some cases, where your key is same as your value, there is no difference between key-value and traditional memory networks. Now let us say that in the sentence case, the key and memory both are the same bag of words, then there is no difference. But now you could have the key different from the memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment