Samcfuchs/notes.md

## notes.md

      
    Raw
  

              notes.md
            
          
    1. Abstract


Enhances the document encoder with an additional graph-structured encoder to
maintain global context and local characteristics.
Both document encoder and graph encoder are used for abstract generation
Outperforms pretrained language models (e.g. BERT)
Generates better summaries
Automatic evaluation might not catch all the important errors

3. Knowledge Graph Construction


Use Stanford CoreNLP

OpenIE model
Performs coreference resolution
Does not do global entity linking
Uses OpenIE triples extracted


Subjects and objects are nodes connected by edges

Collapse coreferential mentions


4. Summarization Model

Model inputs:

Document (sequence of tokens)
Knowledge graph consisting of nodes

The document and the knowledge graph are encoded separately
4.1 Encoders

Document encoder: RoBERTa -> bi-LSTM
Graph encoder:

nodes for predicates and subjects
Subject -> predicate
Predicate -> object
Each node contains multiple mentions of the entity. We take the mean of the
embeddings of each mention.
The graph goes through a GAT (graph attention network)

Capturing topic shift:

Encode each paragraph as a subgraph (same encoder)
Connect all the subgraphs with a bi-LSTM (how?)

Apply max-pooling over all nodes in the subgraph from the GAT output
Use max-pooled results as input to the LSTM


4.2 Summary Decoder

Single-layer LSTM generating summary tokens, simultaneously processing the
graph and the document
Apply some attention mechanism to the graph?
SegGraph conveys a sense of topic shift between paragraphs

So nodes are weighted so as to give more weight to nodes that are relevant
(i.e. nodes in the same or relevant paragraphs)

4.3 Training Objectives

Maximum likelihood loss function
Node salience labeling:

train network to prioritize nodes (subjects) that appear in summaries
gold-standard: mask is 1 for a node if it is in the reference summary, zero
otherwise

5 Reinforcement Learning with Cloze

I'm still pretty new to reinforcement learning :(
ROUGE?

Automatically generate questions from a human reference summary
Train a QA model (RoBERTa) to answer questions by reading context

concatenate context, question, and four candidate answers


7 Results

Acknowledge 3 kinds of errors:

hallucination (made-up information)
out of context (including information without useful context)
deletion error (mistakenly deleting important subjects or clauses)

Could build on top of pre-trained encoder-decoder (like BART)
Existing metrics don't adequately capture the presence or absence of key error
types--we need better metrics to identify this.