Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wolfram77/eadf5e7da593f8f8375fc1b9daa5e242 to your computer and use it in GitHub Desktop.
Save wolfram77/eadf5e7da593f8f8375fc1b9daa5e242 to your computer and use it in GitHub Desktop.
Efficient Top-K SimRank-based Similarity Join : NOTES

In the poster "Efficient Top-K SimRank-based Similarity Join", Wenbo Tao presents an algorithm to find k pairs of nodes with the largest SimRank values, using a random-walk based method. Tao limits random walks to five steps without loos of too much accuracy, and computes summation of one-way paths sim(a, x, l) using dynamic programming, where a is the source node, x is the target node, and l is the length of the path. Top-k SimRank-based similarity join problem can then be solved by identifying k pairs of vectors with the largest inner product values. This can be solved with the matrix compression method by Low et al. in O(nD^ξ), where ξ is the number of iterations. Their results show good performance improvement over SRJ Query by Zheng et al., which requires index building and selection of a threshold value.

Critical review

While Tao explores the problem of finding top-k SimRank values across the entire graph, Tao does not consider the problem of finding top-k SimRank values within a subset of nodes, say within a community or a set of communities. Further, Tao's solution only applies to static graphs, and utilizes the original SimRank formulation which can be unintuitive in a number of network structures.

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment