Skip to content

Instantly share code, notes, and snippets.

@Samcfuchs
Created March 15, 2021 00:44
Show Gist options
  • Save Samcfuchs/d17d0410bf772eb6f6c03d2959f51afb to your computer and use it in GitHub Desktop.
Save Samcfuchs/d17d0410bf772eb6f6c03d2959f51afb to your computer and use it in GitHub Desktop.

1. Abstract

  • Enhances the document encoder with an additional graph-structured encoder to maintain global context and local characteristics.
  • Both document encoder and graph encoder are used for abstract generation
  • Outperforms pretrained language models (e.g. BERT)
  • Generates better summaries
  • Automatic evaluation might not catch all the important errors

3. Knowledge Graph Construction

  • Use Stanford CoreNLP
    • OpenIE model
    • Performs coreference resolution
    • Does not do global entity linking
    • Uses OpenIE triples extracted
  • Subjects and objects are nodes connected by edges
    • Collapse coreferential mentions

4. Summarization Model

Model inputs:

  • Document (sequence of tokens)
  • Knowledge graph consisting of nodes

The document and the knowledge graph are encoded separately

4.1 Encoders

Document encoder: RoBERTa -> bi-LSTM

Graph encoder:

  • nodes for predicates and subjects
  • Subject -> predicate
  • Predicate -> object
  • Each node contains multiple mentions of the entity. We take the mean of the embeddings of each mention.
  • The graph goes through a GAT (graph attention network)

Capturing topic shift:

  • Encode each paragraph as a subgraph (same encoder)
  • Connect all the subgraphs with a bi-LSTM (how?)
    • Apply max-pooling over all nodes in the subgraph from the GAT output
    • Use max-pooled results as input to the LSTM

4.2 Summary Decoder

Single-layer LSTM generating summary tokens, simultaneously processing the graph and the document

Apply some attention mechanism to the graph?

SegGraph conveys a sense of topic shift between paragraphs

  • So nodes are weighted so as to give more weight to nodes that are relevant
  • (i.e. nodes in the same or relevant paragraphs)

4.3 Training Objectives

Maximum likelihood loss function

Node salience labeling:

  • train network to prioritize nodes (subjects) that appear in summaries
  • gold-standard: mask is 1 for a node if it is in the reference summary, zero otherwise

5 Reinforcement Learning with Cloze

I'm still pretty new to reinforcement learning :(

ROUGE?

  • Automatically generate questions from a human reference summary
  • Train a QA model (RoBERTa) to answer questions by reading context
    • concatenate context, question, and four candidate answers

7 Results

Acknowledge 3 kinds of errors:

  • hallucination (made-up information)
  • out of context (including information without useful context)
  • deletion error (mistakenly deleting important subjects or clauses)

Could build on top of pre-trained encoder-decoder (like BART)

Existing metrics don't adequately capture the presence or absence of key error types--we need better metrics to identify this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment