Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Created October 15, 2019 13:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lakshay-arora/c713e5e124952535ef2bc911f4079a72 to your computer and use it in GitHub Desktop.
Save lakshay-arora/c713e5e124952535ef2bc911f4079a72 to your computer and use it in GitHub Desktop.
# Distributed Data Type - Row Matrix
from pyspark.mllib.linalg.distributed import RowMatrix
# create RDD
rows = sc.parallelize([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])
# create a distributed Row Matrix
row_matrix = RowMatrix(rows)
print(row_matrix)
# >> <pyspark.mllib.linalg.distributed.RowMatrix at 0x7f425884d7f0>
print(row_matrix.numRows())
# >> 4
print(row_matrix.numCols())
# >> 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment