wolfram77/notes-a-novel-and-fast-simrank-algorithm.md

## notes-a-novel-and-fast-simrank-algorithm.md

      
    Raw
  

              notes-a-novel-and-fast-simrank-algorithm.md
            
          
    In the paper "A Novel and Fast SimRank Algorithm", Lu et al. avoid the approximation of SimRank with (1-c)I, or use of diagonal matrix to compute exact SimRank (which is costly to compute). Instead they use conjugate gradient algorithm along with jacobi preconditioner - after splitting matrices into "same" (S) and "disjoint" (D) components:

Preconditioned conjugate gradient method for undirected graph, and
Preconditioned bi-conjugate gradient stabilized method for directed graph

These methods speed up the convergence rate by finding the optimal search direction in each iteration. Further, they use common-neighbor sharing technique to optimize computation.
Lu et al. compare the algorithms with Optimal In-neighbors Partitioning (OIP), Successive Over-Relaxation (SOR), and Scalable SimRank Join (SSJ, stochastic) algorithms. They use a decay factor of c = 0.8 (although 0.6 is more common in literature), and an error bound of ε = 0.001. In SOR, they select different optimal w for each dataset. To measure the accuracy of each algorithm, they use Normalized Discounted Cumulative Gain (NDCG) and Average Difference (AvgDiff). Lu et al. observe that their algorithms are on average 2x faster than OIP, 4x faster than SOR, and 20x faster than SSJ. The required number of iterations of Gauss-Southwell algorithm is not stable, and thus the speed of SSJ has a wide scope of fluctuations.
Related work

Algorithms for computing SimRank can be separated into two categories: deterministic and stochastic.
Stochastic techniques are based on the idea that the SimRank score, with decay factor c, between two nodes is their expected f-meeding distance. Monte Carlo simulation computes all-pair SimRank with runtime O(knm). Kusumoto et al.'s solution is based on approximate diagonal correction matrix D ~ (1-c)I. Maehara et al. eliminate the diagonal correction error by iteratively updating diagonal correction matrix D using Monte Carlo samples. They used the Gauss-Southwell algorithm.
Li et al. first approximated the diagonal correction matrix D by (1-c)I and used matrix inversion to sompute SimRank. To obtain inverse matrix efficiently, they developed a low rank approximation algorithm using matrix decomposition techniques. However, their computation is not efficient when the rank r is not small. Further, approximation of diagonal correction matrix may generate large error in SimRank scores.
Fujiwara et al. have proposed a spectral decomposition based algorithm for SimRank. Yu et al. focus on assessment of partial-pairs SimRank in a self-contained manner using a seed germination model. In another paper by Yu et al., they provide an algorithm to compute D_k recursively. However, precomputing the diagonal correction matrix incurs O(knm + k^2*n^2) cost. Further, existing algorithms require k = ceil(log_c ε) - 1 iterations which is slow (for c = 0.8 and ε = 0.01, k = 50).
In another paper, Yu et al. propose fast matrix multiplication and successive over relaxation (SOR), which accelerates the convergence rate. When w = 1, SOR reduces to the Gauss-Seidel method. The rate of convergence is accelerated when w ∈ [1, 2]. However, it is expensive to determine the optimal value of w. It is only set empirically.
Lizorkin et al. propose a partial sum memorization method which reduces the cost of all-pairs SimRank computation. In another paper, Yu uses sub-summation strategy to speed up SimRank computation.

  
## notes-a-novel-and-fast-simrank-algorithm.pdf

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              notes-a-novel-and-fast-simrank-algorithm.pdf
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.