Skip to content

Instantly share code, notes, and snippets.

@umbertogriffo
Created October 26, 2016 08:05
Show Gist options
  • Select an option

  • Save umbertogriffo/25050693923cd751105fe98443caa156 to your computer and use it in GitHub Desktop.

Select an option

Save umbertogriffo/25050693923cd751105fe98443caa156 to your computer and use it in GitHub Desktop.
Utility Methods to Transpose a org.apache.spark.mllib.linalg.distributed.RowMatrix
def transposeRowMatrix(m: RowMatrix): RowMatrix = {
val transposedRowsRDD = m.rows.zipWithIndex.map{case (row, rowIndex) => rowToTransposedTriplet(row, rowIndex)}
.flatMap(x => x) // now we have triplets (newRowIndex, (newColIndex, value))
.groupByKey
.sortByKey().map(_._2) // sort rows and remove row indexes
.map(buildRow) // restore order of elements in each row and remove column indexes
new RowMatrix(transposedRowsRDD)
}
def rowToTransposedTriplet(row: Vector, rowIndex: Long): Array[(Long, (Long, Double))] = {
val indexedRow = row.toArray.zipWithIndex
indexedRow.map{case (value, colIndex) => (colIndex.toLong, (rowIndex, value))}
}
def buildRow(rowWithIndexes: Iterable[(Long, Double)]): Vector = {
val resArr = new Array[Double](rowWithIndexes.size)
rowWithIndexes.foreach{case (index, value) =>
resArr(index.toInt) = value
}
Vectors.dense(resArr)
}
@nikhilbaby
Copy link
Copy Markdown

Can you share the PySpark version for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment