Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wolfram77/db2c4a8b0eade3972ec096ea94b5b979 to your computer and use it in GitHub Desktop.
Save wolfram77/db2c4a8b0eade3972ec096ea94b5b979 to your computer and use it in GitHub Desktop.
SLING: A Near-Optimal Index Structure for SimRank : NOTES

In the paper "SLING: A Near-Optimal Index Structure for SimRank", Tian and Xiao present SimRank via Local updates and sampling (SLING), an efficient index structure which guarentees at most ε additive error, and answers single pair and single source SimRank queries in O(1/ε) and O(n/ε) time - which is near optimal. Further, SLING requires only O(n/ε) space and O(m/ε + n log ...) pre-computation time.

Tian and Xiao use √c reverse random walks, where at each step, we inspect the in-neighbors of current node and select one of them uniformly at random with a probability of √c, or stop with a probability of 1 - √c. Each √c walk has an expected length of 1/(1-√c). They then use hitting probabilities hˡ(i,k) and hˡ(j,k) to compute SimRank score s(i,j) with dₖ - a correction factor which represents the probability that two √c walks do not meet each other after the 0th step. They propose to pre-compute approximate versions of dₖ and hˡ(i,k) as the index, and use them upon query to estimate SimRank scores. They retain only a constant number of HPs for each node, as they permit a constant additive error in each HP. The challenge addressed in the design of SLING is to derive an accurate estimate of dₖ, efficiently construct H(i) without iterating over all HPs, and ensuring a joint worst-case error of ε in each SimRank score.

Tian and Xiao then proceed to describe their estimation of dₖ, construction of H(i), query method. Next, they optimize their estimation of dₖ, reduce space consumption by H(i) by ignoring 1- and 2-hop random walks (as they are easy to perform at query time), and enhance their accuracy of computing H(i) by computing some of them on the fly on query time. They then go on to describe that most of SLING is embarassingly parallelizable, and that HP(i) can be stored on disk for large graphs.

Critical review

Tian and Xiao do not implement a parallel version of SLING, and do not provide any experimental results on large graphs. Further, they do not discuss how to update the index in the presence of edge insertions and deletions.

Related work

SimRank++ by Antonellis et al. extends SimRank by taking into account the weights of edges and prior knowledge of node similarities. RoleSim by Jin et al. guarentees to recognize automorphically or structurally equivalent nodes. PSimRank by Fogaras and Racz improves the quality of SimRank by allowing random walks that are close to each other to have a higher probability to meet. SimRank# by Yu and McCann defines the similarity between two nodes based on cosine similarity of their neighbors. P-Rank by Zhao et al. consider both in-neighbors and out-neighbors of two nodes when measuring their similarity.

A number of works study top-k SimRank queries and SimRank similarity joins. top-k SimRank queries takes as input a node v and asks k nodes that have the largest SimRank score with v. On the other hand, SimRank similarity join asks for all pairs of nodes whose SimRank scores are among the largest k, or are larger than a predefined threshold.

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment