Skip to content

Instantly share code, notes, and snippets.

@shagunsodhani
Created August 21, 2016 15:19
Show Gist options
  • Save shagunsodhani/2788ac9dbcac5523cb8b2d0a3d70f2d2 to your computer and use it in GitHub Desktop.
Save shagunsodhani/2788ac9dbcac5523cb8b2d0a3d70f2d2 to your computer and use it in GitHub Desktop.
Summary of "WikiReading : A Novel Large-scale Language Understanding Task over Wikipedia" paper

WikiReading : A Novel Large-scale Language Understanding Task over Wikipedia

Introduction

  • Large scale natural language understanding task - predict text values given a knowledge base.
  • Accompanied by a large dataset generated using Wikipedia
  • Link to the paper

Dataset

  • WikiReading dataset built using Wikidata and Wikipedia.
  • Wikidata consists of statements of the form (property, value) about different items
  • 80M statements, 16M items and 884 properties.
  • These statements are grouped by items to get (item, property, answer) tuples where the answer is a set of values.
  • Items are further replaced by their Wikipedia documents to generate 18.58M statements of the form (document, property, answer).
  • Task is to predict answer given document and property.
  • Properties are divided into 2 classes:
    • Categorical properties - properties with a small number of possible answers. Eg gender.
    • Relational properties - properties with unique answers. Eg date of birth.
  • This classification is done on the basis of the entropy of answer distribution.
  • Properties with entropy less than 0.7 are classified as categorical properties.
  • Answer distribution has a small number of very high-frequency answers (head) and a large number of answers with very small frequency (tail).
  • 30% of the answers do not appear in the training set and must be inferred from the document.

Models

Answer Classification

  • Consider WikiReading as classification task and treat each answer as a class label.

Baseline

  • Linear model over Bag of Words (BoW) features.
  • Two BoW vectors computed - one for the document and other for the property. These are concatenated into a single feature vector.

Neural Networks Method

  • Encode property and document into a joint representation which is fed into a softmax layer.

  • Average Embeddings BoW

    • Average the BoW embeddings for documents and property and concatenate to get joint representation.
  • Paragraph Vectors

    • As a variant of the previous method, encode document as a paragraph vector.
  • LSTM Reader

    • LSTM reads the property and document sequence, word-by-word, and uses the final state as joint representation.
  • Attentive Reader

    • Use attention mechanism to focus on relevant parts of the document for a given property.
  • Memory Networks

    • Maps a property p and list of sentences x1, x2, ...xn in a joint representation by attention over the sentences in the document.

Answer Extraction

  • For relational properties, it makes more sense to model the problem as information extraction than classification.

  • RNNLabeler

    • Use an RNN to read the sequence of words and estimate if a given word is part of the answer.
  • Basic SeqToSeq (Sequence to Sequence)

    • Similar to LSTM Reader but augmented with a second RNN to decode answer as a sequence of words.
  • Placeholder SeqToSeq

    • Extends Basic SeqToSeq to handle OOV (Out of Vocabulary) words by adding placeholders to the vocabulary.
    • OOV words in the document and answer are replaced by placeholders so that input and output sentences are a mixture of words and placeholders only.
  • Basic Character SeqToSeq

    • Property encoder RNN reads the property, character-by-character and transforms it into a fixed length vector.
    • This becomes the initial hidden state for the second layer of a 2-layer document encoder RNN.
    • Final state of this RNN is used by answer decoder RNN to generate answer as a character sequence.
  • Character SeqToSeq with pretraining

    • Train a character-level language model on input character sequence from the training set and use the weights to initiate the first layer of encoder and decoder.

Experiments

  • Evaluation metric is F1 score (harmonic mean of precision and accuracy).
  • All models perform well on categorical properties with neural models outperforming others.
  • In the case of relational properties, SeqToSeq models have a clear edge.
  • SeqToSeq models also show a great deal of balance between relational and categorical properties.
  • Language model pretraining enhances the performance of character SeqToSeq approach.
  • Results demonstrate that end-to-end SeqToSeq models are most promising for WikiReading like tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment