shagunsodhani/Traversing Knowledge Graphs in Vector Space.md

## Traversing Knowledge Graphs in Vector Space.md

      
    Raw
  

              Traversing Knowledge Graphs in Vector Space.md
            
          
    Traversing Knowledge Graphs in Vector Space

Introduction


The paper describes a "compositional" training approach for vector space models, corresponding to Knowledge Bases (KBs).
The new approach improves the system's ability to answer path queries and impute missing information for the KBs.
Link to the paper

Task


Given a KB, knowledge graph, G, is defined as the set of triplets (s, r, t) where s, t ∈ Entities and r ∈ Relations.
A path query q consists of an initial entity, s, followed by a sequence of relations, p, to be traversed.
The answer to the query is the set of all the entities that can be reached from s by traversing p.
Knowledge base completion (KBC) is the task of predicting if an edge (s, r, t) belongs in the graph.

Compositionalization


Given a triplet (s, r, t), define score(s/r, t) as the liklihood of s being connected to t via r.
In general, score(s/r, t) = M(T_r(x_s), x_t) for some membership operator M and some traversal operator T.
Given a dataset of form (q, t) where q is the path query and t is the answer to the path query,

Minimize the max-margin objective 1 - margin(q, t, t')
margin(q, t, t') = score(q, t) - score(q, t')


This objective function is better that the existing objectives which only train on queries of length 1 (single-edge training).

Candidates Models


TransE

score(s/r, t) = -|| x<sub>s</sub> + w><sub>r</sub> - x<sub>t</sub>||<sub>2</sub><sup>2</sup>


Bilinear-Diag

Similar to TransE, but with multiplicative interactions between entity and relation vectors.


Datasets


Single-Edge Query datasets:

Freebase
WordNet


Path Query Dataset

Given a base knowledge graph, generate path queries of different lengths by performing random walks on the graph.


Results


Evalution Metric

Mean Quantile - For a query q, the quantile of a correct answer t is the fraction of incorrect answers ranked after t.
hit at 10 - Percentage of correct answers ranked among top 10 results.


Compositional training improves path querying performance across all models and metrics on both the datasets.


TransE(COMP) is the best model in terms of mean quantile.


Performance improves for both induction and deduction based queries.


Analysis


Why does compositional training improve path query answering?

Cascading nature of errors along the path - For a given edge (s, r, t) on the path, the single-edge training encourages x_t to be closer to x_s, only to the extent that margin is 1 and does not push them any closer. The remaining discrepancy gets added as noise at each step of the traversal.


Why does compositional training improve knowledge base completion?

Paths in a knowledge graph are an important feature for predicting the existence of single edges and training on paths should provide some form of structural regularisation which should reduce cascading errors.