Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
# Distributed Data Type - Row Matrix
from pyspark.mllib.linalg.distributed import RowMatrix
# create RDD
rows = sc.parallelize([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])
# create a distributed Row Matrix
row_matrix = RowMatrix(rows)
print(row_matrix)
# >> <pyspark.mllib.linalg.distributed.RowMatrix at 0x7f425884d7f0>
print(row_matrix.numRows())
# >> 4
print(row_matrix.numCols())
# >> 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment