Skip to content

Instantly share code, notes, and snippets.

@pcmoritz
Last active May 20, 2019 13:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pcmoritz/5989d5f28c291f4443e33818a2438f68 to your computer and use it in GitHub Desktop.
Save pcmoritz/5989d5f28c291f4443e33818a2438f68 to your computer and use it in GitHub Desktop.
# install the irlb python package, in a shell do
# pip install -e git+https://github.com/bwlewis/irlbpy.git#egg=irlb
# Note that I don't expect the python implementation to be as polished as the R version (even though it is by the same author)
# For now it is easier for me to work with b/c I know Python better than R.
from scipy.io import mminfo,mmread
import numpy as np
X = mmread("matrix.mtx")
%timeit S = irlb.irlb(X, 10)
# 1 loop, best of 3: 27.9 s per loop
M = X.tocsr()
%timeit S = irlb.irlb(M, 10)
# 1 loop, best of 3: 8.13 s per loop
## Some info:
# sparsity
In [12]: 1.0 * X.nnz / (X.shape[0] * X.shape[1])
Out[12]: 0.023389614034593154
mu = X.mean(axis=0)
# In [14]: mu.min()
# Out[14]: 0.014661860834504188
# In [15]: mu.max()
# Out[15]: 0.47895412059380754
v = np.random.rand(X.shape[1])
# In [17]: %timeit X.dot(v)
# 10 loops, best of 3: 59.8 ms per loop
# In [19]: %timeit M.dot(v)
# 10 loops, best of 3: 35.6 ms per loop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment