Skip to content

Instantly share code, notes, and snippets.

@wolfram77
Last active September 25, 2023 16:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wolfram77/59def16897d2bdab263748b964b8892d to your computer and use it in GitHub Desktop.
Save wolfram77/59def16897d2bdab263748b964b8892d to your computer and use it in GitHub Desktop.
A Novel and Fast SimRank Algorithm : NOTES

In the paper "A Novel and Fast SimRank Algorithm", Lu et al. avoid the approximation of SimRank with (1-c)I​, or use of diagonal matrix to compute exact SimRank (which is costly to compute). Instead they use conjugate gradient algorithm along with jacobi preconditioner - after splitting matrices into "same" (S) and "disjoint" (D) components:

  • Preconditioned conjugate gradient method for undirected graph, and
  • Preconditioned bi-conjugate gradient stabilized method for directed graph

These methods speed up the convergence rate by finding the optimal search direction in each iteration. Further, they use common-neighbor sharing technique to optimize computation.

Lu et al. compare the algorithms with Optimal In-neighbors Partitioning (OIP), Successive Over-Relaxation (SOR), and Scalable SimRank Join (SSJ, stochastic) algorithms. They use a decay factor of c = 0.8 (although 0.6 is more common in literature), and an error bound of ε = 0.001​. In SOR, they select different optimal w for each dataset. To measure the accuracy of each algorithm, they use Normalized Discounted Cumulative Gain (NDCG) and Average Difference (AvgDiff). Lu et al. observe that their algorithms are on average 2x faster than OIP, 4x faster than SOR, and 20x faster than SSJ. The required number of iterations of Gauss-Southwell algorithm is not stable, and thus the speed of SSJ has a wide scope of fluctuations.

Related work

Algorithms for computing SimRank can be separated into two categories: deterministic and stochastic.

Stochastic techniques are based on the idea that the SimRank score, with decay factor c​, between two nodes is their expected f-meeding distance. Monte Carlo simulation computes all-pair SimRank with runtime O(knm)​. Kusumoto et al.'s solution is based on approximate diagonal correction matrix D ~ (1-c)I​. Maehara et al. eliminate the diagonal correction error by iteratively updating diagonal correction matrix D​ using Monte Carlo samples. They used the Gauss-Southwell algorithm.

Li et al. first approximated the diagonal correction matrix D​ by (1-c)I​ and used matrix inversion to sompute SimRank. To obtain inverse matrix efficiently, they developed a low rank approximation algorithm using matrix decomposition techniques. However, their computation is not efficient when the rank r is not small. Further, approximation of diagonal correction matrix may generate large error in SimRank scores.

Fujiwara et al. have proposed a spectral decomposition based algorithm for SimRank. Yu et al. focus on assessment of partial-pairs SimRank in a self-contained manner using a seed germination model. In another paper by Yu et al., they provide an algorithm to compute D_k recursively. However, precomputing the diagonal correction matrix incurs O(knm + k^2*n^2) cost. Further, existing algorithms require k = ceil(log_c ε) - 1 iterations which is slow (for c = 0.8​ and ε = 0.01​, k = 50​).

In another paper, Yu et al. propose fast matrix multiplication and successive over relaxation (SOR), which accelerates the convergence rate. When w = 1, SOR reduces to the Gauss-Seidel method. The rate of convergence is accelerated when w ∈ [1, 2]​. However, it is expensive to determine the optimal value of w. It is only set empirically.

Lizorkin et al. propose a partial sum memorization method which reduces the cost of all-pairs SimRank computation. In another paper, Yu uses sub-summation strategy to speed up SimRank computation.

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment