Skip to content

Instantly share code, notes, and snippets.

View wolfram77's full-sized avatar

Subhajit Sahu wolfram77

View GitHub Profile
@wolfram77
wolfram77 / thesis-accelerating-pagerank-with-a-heterogeneous-two-phase-cpu-fpga-algorithm.md
Created March 31, 2024 14:20
Accelerating Pagerank With a Heterogeneous two Phase CPU-FPGA Algorithm : THESIS

I think I wrote a summary of this thesis, but I cannot find it anymore.

@wolfram77
wolfram77 / script-graph-mtx-add-self-loop.js
Created March 31, 2024 04:27
Add self-loops to a graph file : SCRIPT
const fs = require('fs');
const path = require('path');
const readline = require('readline');
// Read the header of a graph file.
async function readGraphHeader(pth) {
var fstream = fs.createReadStream(pth);
@wolfram77
wolfram77 / script-create-gist.js
Created December 25, 2023 02:09
Create gist for PDF files in a folder : SCRIPT
const cp = require('child_process');
const fs = require('fs');
const path = require('path');
const _ = require('lodash');
// Main function.
function main() {
for (var file of fs.readdirSync('./')) {
@wolfram77
wolfram77 / config-htoprc-3.0.5.ini
Created December 14, 2023 09:29
Default htoprc for htop 3.0.5 : CONFIG
# Beware! This file is rewritten by htop when settings are changed in the interface.
# The parser is also very primitive, and not human-friendly.
fields=0 48 17 18 38 39 40 2 46 47 49 1
sort_key=46
sort_direction=-1
tree_sort_key=0
tree_sort_direction=1
hide_kernel_threads=1
hide_userland_threads=0
shadow_other_users=0
@wolfram77
wolfram77 / script-convert-latex-to-markdown.js
Last active November 25, 2023 17:28
Convert LaTeX code to Markdown : SCRIPT
const os = require('os');
const fs = require('fs');
// Read a file.
function readFile(pth) {
var txt = fs.readFileSync(pth, 'utf8');
return txt.replace(/\r\n?/g, '\n');
@wolfram77
wolfram77 / notes-sling-a-near-optimal-index-structure-for-simrank.md
Last active September 27, 2023 14:19
SLING: A Near-Optimal Index Structure for SimRank : NOTES

In the paper "SLING: A Near-Optimal Index Structure for SimRank", Tian and Xiao present SimRank via Local updates and sampling (SLING), an efficient index structure which guarentees at most ε additive error, and answers single pair and single source SimRank queries in O(1/ε) and O(n/ε) time - which is near optimal. Further, SLING requires only O(n/ε) space and O(m/ε + n log ...) pre-computation time.

Tian and Xiao use √c reverse random walks, where at each step, we inspect the in-neighbors of current node and select one of them uniformly at random with a probability of √c, or stop with a probability of 1 - √c. Each √c walk has an expected length of 1/(1-√c). They then use hitting probabilities hˡ(i,k) and hˡ(j,k) to compute SimRank score s(i,j) with dₖ - a correction factor which represents the probability that two √c walks do not meet each other after the 0th step. They propose to pre-compute approximate versions of dₖ and hˡ(i,k) as the index, and use them upon query to es

@wolfram77
wolfram77 / notes-efficient-top-k-simrank-based-similarity-join.md
Created September 26, 2023 18:10
Efficient Top-K SimRank-based Similarity Join : NOTES

In the poster "Efficient Top-K SimRank-based Similarity Join", Wenbo Tao presents an algorithm to find k pairs of nodes with the largest SimRank values, using a random-walk based method. Tao limits random walks to five steps without loos of too much accuracy, and computes summation of one-way paths sim(a, x, l) using dynamic programming, where a is the source node, x is the target node, and l is the length of the path. Top-k SimRank-based similarity join problem can then be solved by identifying k pairs of vectors with the largest inner product values. This can be solved with the matrix compression method by Low et al. in O(nD^ξ), where ξ is the number of iterations. Their results show good performance improvement over SRJ Query by Zheng et al., which requires index building and selection of a threshold value.

Critical review

While Tao explores the problem of finding top-k SimRank values across the entire graph, Tao does not consider the problem of finding top-k SimRank values within

@wolfram77
wolfram77 / notes-accelerating-pairwise-simrank-estimation-over-static-and-dynamic-graphs.md
Created September 26, 2023 13:29
Accelerating pairwise SimRank estimation over static and dynamic graphs : NOTES

In the paper "Accelerating pairwise SimRank estimation over static and dynamic graphs", Wang et al. motivate that sometimes people may issue a set of similarity queries, which may not be restricted to a particular source. For example, people may want to know "Which two scientists in the DB area, who have not collaborated with each other before, would likely collaborate in the future?", or "Which scientists in DB, who have collaborated with AI researchers before, would keep on collaborating with other AI scientists?"

Wang et al. note that iterative deterministic methods have two drwabacks. One is its BFS fixed-order expansion, which is not optimized for the local structure of query nodes. It may also contain unnecessary computation, which contributes little to the final SimRank score. Second, as k increases, the number of k-hop neighbors grows exponentially.

Most recent solutions for dynamic SimRank query are based on its probabilistic interpretation: s(a,b) is the expected f-meeting time of two revers

@wolfram77
wolfram77 / notes-a-novel-and-fast-simrank-algorithm.md
Last active September 25, 2023 16:51
A Novel and Fast SimRank Algorithm : NOTES

In the paper "A Novel and Fast SimRank Algorithm", Lu et al. avoid the approximation of SimRank with (1-c)I​, or use of diagonal matrix to compute exact SimRank (which is costly to compute). Instead they use conjugate gradient algorithm along with jacobi preconditioner - after splitting matrices into "same" (S) and "disjoint" (D) components:

  • Preconditioned conjugate gradient method for undirected graph, and
  • Preconditioned bi-conjugate gradient stabilized method for directed graph

These methods speed up the convergence rate by finding the optimal search direction in each iteration. Further, they use common-neighbor sharing technique to optimize computation.

Lu et al. compare the algorithms with Optimal In-neighbors Partitioning (OIP), Successive Over-Relaxation (SOR), and Scalable SimRank Join (SSJ, stochastic) algorithms. They use a decay factor of c = 0.8 (although 0.6 is more common in literature), and an error bound of ε = 0.001​. In SOR, they select different optimal w for each dat

@wolfram77
wolfram77 / notes-fast-computation-of-simrank-for-static-and-dynamic-information-networks.md
Last active September 25, 2023 09:43
Fast Computation of SimRank for Static and Dynamic Information Networks : NOTES

In the paper "Fast Computation of SimRank for Static and Dynamic Information Networks", Li et al. discuss a linear-algebra computation on static and dynamic SimRank. They use the original SimRank formulation by Widom et al. with a small modification --- this allows them to use matrix operations but simrank of node with itself is no longer 1 (somewhat higher).

They use kronecker product, vectorization operator, low-rank (k) approximation, and SVD decomposition to compute four precomputed matrices of sizes k^2k^2, n^2k^2, k^2n^2, and k^21 (the middle two can be made sparse with thresholding) in O(k^4*n^2) time. Further, for each query pair the require O(k^4) time. Further, they demonstrate a procedure for updation of those matrices in the case of dynamic graphs.

Critical review: While they highlight their non-iterative simrank computation framework, decomposition of matrices is still iterative, and their algorithm also requires inversion which can be expensive as noted in their runtime and storage costs.