Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Last active October 22, 2019 11:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lakshay-arora/709f0ad3e12957707906f473935b5ed1 to your computer and use it in GitHub Desktop.
Save lakshay-arora/709f0ad3e12957707906f473935b5ed1 to your computer and use it in GitHub Desktop.
from pyspark.mllib.linalg.distributed import CoordinateMatrix, MatrixEntry
# Create an RDD of coordinate entries with the MatrixEntry class:
matrix_entries = sc.parallelize([MatrixEntry(0, 5, 2), MatrixEntry(1, 1, 1), MatrixEntry(1, 5, 4)])
# Create an CoordinateMatrix from an RDD of MatrixEntries.
c_matrix = CoordinateMatrix(matrix_entries)
# number of columns
print(c_matrix.numCols())
# >> 6
# number of rows
print(c_matrix.numRows())
# >> 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment