Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Last active October 15, 2019 13:38
Show Gist options
  • Save lakshay-arora/d9dfef48a6459cdeea367b69ce66c06f to your computer and use it in GitHub Desktop.
Save lakshay-arora/d9dfef48a6459cdeea367b69ce66c06f to your computer and use it in GitHub Desktop.
# Indexed Row Matrix
from pyspark.mllib.linalg.distributed import IndexedRow, IndexedRowMatrix
# create RDD
indexed_rows = sc.parallelize([
IndexedRow(0, [0,1,2]),
IndexedRow(1, [1,2,3]),
IndexedRow(2, [3,4,5]),
IndexedRow(3, [4,2,3]),
IndexedRow(4, [2,2,5]),
IndexedRow(5, [4,5,5])
])
# create IndexedRowMatrix
indexed_rows_matrix = IndexedRowMatrix(indexed_rows)
print(indexed_rows_matrix.numRows())
# >> 6
print(indexed_rows_matrix.numCols())
# >> 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment